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This  paper  presents  a tutorial  review  of  recent  advances  in  the  field  of  time-frequency  (t,  /)  signal 
processing  with  focus  on  exploiting  (t,  /)  image  feature  information  using  pattern  recognition  techniques 
for  detection  and  classification  applications.  This  is  achieved  by  (1)  revisiting  and  streamlining  the 
design  of  high-resolution  quadratic  time  frequency  distributions  (TFDs)  so  as  to  produce  adequate  (t,  /) 
images,  (2)  using  image  enhancement  techniques  to  improve  the  resolution  of  TFDs,  and  (3)  defining  new 
(t,  /)  features  such  as  (t,  /)  flatness  and  (t,  /)  entropy  by  extending  time-domain  or  frequency-domain 
features.  Comparative  results  indicate  that  the  new  (t,  /)  features  give  better  performance  as  compared  to 
time-only  or  frequency-only  features  for  the  detection  of  abnormalities  in  newborn  EEG  signals.  Defining 
high-resolution  TFDs  for  the  extraction  of  new  (t,  /)  features  further  improves  performance.  The  findings 
are  corroborated  by  new  experimental  results,  theoretical  derivations  and  conceptual  insights. 
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1.  Introduction 

1.1.  Non-stationary  signals 

There  has  been  a significant  effort  in  the  past  decades  to  de- 
velop Digital  Signal  Processing  (DSP)  methods  that  are  more  ap- 
propriate for  the  representation,  analysis  and  processing  of  non- 
stationary  signals  [1].  The  studies  initially  focused  on  the  design 
of  Spectrograms,  filter-banks  and  quadratic  methods  related  to 
the  Wigner-Ville  Distribution  (WVD)  for  optimal  representation  of 
signals  such  as  mono-component  Linearly  Frequency  Modulated 
(LFM)  signals  used  in  radar  [2],  sonar  [3],  geophysics  [4],  and  other 
applications  [5-8].  Researchers  then  shifted  their  focus  to  the  anal- 
ysis of  multi-component  signals,  resulting  in  a number  of  new  ap- 
proaches to  design  advanced  Time-Frequency  Distributions  (TFDs) 
and  related  techniques  [9-14,8].  Such  multi-component  signals 
provide  a more  precise  modeling  of  most  natural  signals,  includ- 
ing biomedical  signals  such  as  fetal  movement  signal,  phonocar- 
diogram  (PCG),  electrocardiogram  (ECG),  speech,  electroencephalo- 
gram (EEG),  heart  rate  variability  (HRV)  and  many  others.  A typical 
model  for  such  signals  was  originally  proposed  in  ([15],  p.  419). 
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Following  this  approach,  a multi-component  signal  of  time  dura- 
tion T and  bandwidth  B can  be  expressed  as: 

Nc  Nc 

s(t)  = £sk(t)  = J^ak(t)cos(</>k(t)),  (1) 

k= 1 k=  1 

where  Nc  is  the  total  number  of  components,  s/<(t ) is  the  kth  sig- 
nal component,  ajc(t)  and  are  the  instantaneous  amplitude 
(IA)  and  instantaneous  phase  (IP)  of  the  /cth  signal  component.  The 
IA  a/<(t)  is  a low-frequency  amplitude  whose  spectrum  is  assumed 
not  to  overlap  with  the  spectrum  of  the  higher-frequency  signal 
cos (0fc(t))  [16,17].  The  derivative  of  the  IP  //<(t)  = defines 

the  instantaneous  frequency  (IF)  of  the  /cth  signal  component.  The 
above  defines  the  AM-FM  model  of  the  signal  where  the  IA  is  the 
AM  law  and  the  IF  is  the  FM  law  of  the  signal.  The  time  support 
of  ajc(t)  is  T and  the  frequency  support  of  //<(t)  is  B. 

Using  the  above  model,  a multi-component  signal  is  completely 
characterized  by  the  number  of  components  NCf  the  IA  and  IF  of 
each  component;  residual  noise  could  be  added  in  certain  situa- 
tions when  there  are  interferences.  The  time-frequency  ( t , /)  ap- 
proach is  a preferred  method  to  represent  such  multi-component 
signals  as  it  allows  solving  the  problem  of  estimating  the  charac- 
teristics of  the  components  constituting  the  signal  in  Eq.  (1).  For 
example,  it  can  provide  a measure  of  the  energy  leakage  that  takes 
place  around  the  components  due  to  slow  variation  of  their  IAs, 
and  a measure  of  the  relative  energy  distribution  throughout  the 
(t,  /)  plane.  Such  information  can  then  be  used  to  refine  a number 
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of  advanced  techniques  and  methodologies  in  applications  such  as 
detection  and  classification  using  selected  ( t , /)  features  for  pat- 
tern recognition.  An  important  generic  application  of  such  methods 
is  diagnosis,  which  is  related  to  condition  monitoring  and  fault  de- 
tection. This  applies  to  both  machines  and  humans.  For  the  sake 
of  illustration,  we  will  use  physiological  signals  in  this  paper,  al- 
though the  method  can  be  applied  to  all  fields  where  decisions 
need  to  be  made  on  the  basis  of  information  collected  from  non- 
stationary signals. 

1.2.  Pattern  recognition  for  diagnosis , condition  monitoring  and  fault 
detection 

The  basic  idea  is  to  find  and  recognize  patterns  in  the  sig- 
nal that  would  indicate  the  presence  of  a fault,  an  abnormality 
or  otherwise  would  indicate  a normal  condition  for  the  process 
generating  the  signals.  If  there  is  an  abnormality,  then  one  needs 
to  identify  and  classify  the  abnormality.  Although  the  approach 
is  general,  this  paper  illustrates  it  by  focusing  on  some  particu- 
lar physiological  signals  such  as  electroencephalography  (EEG)  for 
monitoring  new-born  brain  activity  and  predicting  newborn  health 
outcomes  through  diagnosis  and  prognosis. 

1.2.1.  Acquisition  and  representation  of  real  signals 

Although  the  methodologies  presented  are  general,  this  paper 
considers  only  physiological  signals  for  illustration,  including  EEG, 
electrocardiography  (ECG),  heart  rate  variability  (HRV)  and  heart 
sounds  which  are  routinely  used  by  medical  specialists  for  diagno- 
sis in  a range  of  situations.  After  acquisition,  these  signals  are  usu- 
ally processed  in  several  stages,  including  a pre-processing  stage 
like  amplification  and  filtering  to  suppress  artifacts  and  noise.  In 
some  situations,  multi-channel  physiological  signals  are  acquired 
using  multiple  spatially  distributed  sensors  that  provide  both  tem- 
poral and  spatial  information.  The  diversity  given  by  space  time  in- 
formation can  allow  a more  refined  analysis  and  better  interpreta- 
tion of  the  data  in  applications  such  as  source  localization,  signal- 
to-noise  ratio  (SNR)  enhancement,  classification,  and  removal  of 
artifacts  [18,19].  As  indicated  earlier,  ( t , f)  analysis  is  widely  used 
for  the  representation  and  analysis  of  physiological  signals  because 
of  the  non-stationary  and  multi-component  characteristics  of  these 
signals;  the  (t,  /)  image  allows  a refined  treatment  of  further  pro- 
cessing stages  such  as  detection  and  classification,  leading  to  diag- 
nosis, condition  monitoring  and  fault  detection. 

1.2.2.  Automated  TF  detection  of  abnormalities  in  non-stationary 
signals 

Early  detection  of  abnormalities  in  physiological  signals  can 
help  save  or  improve  patients’  lives.  Manual  detection  of  abnor- 
mality requires  a constant  monitoring  of  the  relevant  physiolog- 
ical signal  by  a qualified  medical  expert.  The  development  of  an 
automatic  abnormality  detection  technique,  which  can  be  imple- 
mented on  a digital  computer,  would  result  in  a major  advance 
in  medical  practice.  Such  detection  of  abnormality  in  physiolog- 
ical signals  can  be  considered  as  a pattern  classification  prob- 
lem involving  extraction  of  features  from  the  signals  and  training 
of  a classifier.  Features  can  be  extracted  from  the  time-domain 
representation  [20-22],  the  frequency-domain  representation  [23], 
a combination  of  both  time-domain  and  frequency-domain  rep- 
resentations 24],  or  joint  time  frequency  representations  using 
TFDs  [7,25,5,26-30,13,31].  The  performance  of  abnormality  detec- 
tion techniques  depends  of  course  on  the  choice  of  signal  rep- 
resentation and  selection  of  features.  The  abnormality  detection 
techniques  that  are  based  on  TFDs  are  shown  to  outperform  time- 
domain  or  frequency-domain  only  approaches  as  TFDs  are  more 
adapted  to  analyze  the  non-stationary  characteristics  of  physiolog- 
ical signals  [26]  and  convey  critical  information  about  the  signal. 


Fig.  1.  The  proposed  (t,  f)  pattern  recognition  approach. 


TFDs  are  normally  compared  in  terms  of  their  resolution  and 
cross-term  suppression  properties.  Previous  studies  have  shown 
that  high-resolution  TFDs,  like  the  Modified  B-Distribution  (MBD), 
give  good  classification  results  [25,26].  A key  question  is  therefore 
whether  the  performance  of  TFD-based  signal  classification  tech- 
niques can  be  related  to  the  resolution  of  the  TFD.  This  paper  aims 
to  answer  this  question  by  investigating  the  influence  of  resolution 
parameters  on  classification  performance. 

TFDs  are  rich  in  information  partly  because  they  increase  the 
dimensionality  of  the  classification  problem  at  the  expense  of  in- 
creased computational  costs.  In  order  to  avoid  this  problem,  it  is 
important  to  extract  only  the  most  relevant  features  from  the  TFD. 
Such  features  can  be  extracted  in  a number  of  ways.  These  include 
dividing  a TFD  into  a number  of  tiles  and  computing  the  energy  in 
each  tile  [6],  reducing  the  dimension  of  the  TFD  by  linear  trans- 
formation methods  [5,30,31],  selecting  relevant  points  from  a TFD 
based  on  relevance  or  mutual  information  measure  [29],  translat- 
ing time-domain  or  frequency-domain  features  to  (t,  f)  domain 
and  extracting  relevant  features  [26],  and  using  image  processing 
techniques  to  extract  features  like  statistical  measures,  texture  re- 
lated features,  geometric  features,  ridges  [25,28],  to  name  just  a 
few. 

1.3.  Scope  and  key  topic  covered 

This  study  reviews  previous  work  on  the  design  of  high- 
resolution  TFDs  [32,10],  (t,  /)  image  processing  and  feature  extrac- 
tion methodology  [25,26]  with  three  objectives:  (1)  streamlining 
the  design  of  high-resolution  TFDs,  (2)  improve  diagnostic  applica- 
tions and  (3)  providing  the  reader  with  an  illustration  on  physio- 
logical signal  classification  techniques.  The  study  refines,  updates 
and  extends  previous  studies  with  improved  precision  and  illus- 
trates the  improvements  using  a real  application:  enhancing  new- 
born health  outcomes.  The  style  of  this  paper  is  a tutorial  review 
with  scope  and  coverage  defined  by  the  inclusion  of  the  following: 

1.  A streamlined  review  of  quadratic  (t,  f)  distributions  (QTFDs) 
formulations; 

2.  A methodology  for  the  design  of  new  high-resolution  QTFDs 
and  the  study  of  how  they  improve  the  performance  of  fea- 
tures extraction; 

3.  A review  of  multi-component  IF  estimation  techniques  as  a 
performance  measure  to  compare  TFDs; 

4.  A review  of  (t,  /)  image  processing  techniques  for  cross-term 
suppression,  resolution  enhancement  and  de-noising  as  a pre- 
processing stage  before  classification;  including  the  design  of  a 
(t,  /)  image  enhancement  technique  based  on  directional  fil- 
tering to  improve  the  resolution  of  QTFDs; 

5.  A formulation  of  (t,  f)  translated  features  from  time-domain 
only  and  frequency-domain  only  features; 

6.  Illustrative  experiments  to  apply  the  above  points  to  EEG 
seizure  detection  using  a large  medical  database. 

This  study  develops  a (t,  /)  pattern  recognition  methodology  to 
link  design  of  high-resolution  TFDs,  (t,  f)  image  enhancement 
techniques,  and  feature  extraction  stages  as  illustrated  in  Fig.  1. 
Note  that  the  authors  have  restricted  to  the  review  of  only  QTFDs 
because  of  their  high  resolution,  simple  interpretation  as  a dis- 
tribution of  signal  energy  in  a (t,  /)  plane  and  widespread  use. 
The  rest  of  the  paper  is  organized  as  follows.  The  design  of  high- 
resolution  QTFDs  is  discussed  in  Section  2.  Multi-component  IF 
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estimation  techniques  are  described  in  Section  3 and  compared  in 
terms  of  performance.  Then,  Section  4 presents  a review  of  image 
processing  techniques  for  enhancement  of  TFDs.  The  methodology 
for  extracting  (t,  /)  features  is  described  in  Section  5.  Experi- 
mental results  are  described  and  discussed  in  Section  6.  Finally, 
Section  7 concludes  the  paper.  Appendix  A describes  the  resolution 
measure  used  in  this  study  to  compare  the  performance  of  TFDs. 
The  issues  related  to  computational  cost  and  real  time  implemen- 
tation are  discussed  in  Appendix  B.  In  addition,  the  programs  that 
are  used  in  this  study  are  described  in  Appendix  C.  Appendix  D 
gives  a list  of  (t,  /)  features  obtained  by  extending  time-only  fea- 
tures to  the  (t,  /)  domain.  Appendix  E refers  the  reader  to  the  real 
data  used  in  this  study  which  appear  as  Supplementary  material. 

2.  Quadratic  time-frequency  signal  processing 

2.1.  Time  frequency  formulations  and  representations 


The  selected  compromise  for  this  trade-off  selection  depends  on 
the  signal  specifications.  High-resolution  TFDs  are  needed  for  accu- 
rate estimation  of  signal  parameters  such  as  the  IF,  total  number  of 
components  which  are  important  features  for  a number  of  applica- 
tions including  signal  modeling,  analysis  and  classification.  The  es- 
timation of  QTFDs  based  on  the  general  (t,  f)  formulation  requires 
one  FT  and  a two  dimensional  (2-D)  convolution  along  time  and 
frequency  axis.  Along  with  this  general  (t,  /)  formulation,  a QTFD 
can  be  expressed  starting  from  a time-lag  formulation,  Doppler- 
lag  formulation  or  a Doppler-frequency  formulation,  as  outlined  in 
the  next  sections. 

2.12.  Time-lag  formulation  of  QTFDs 

Eq.  (6)  can  be  estimated  via  time-lag  formulation  by  replacing 
the  convolution  operation  along  the  frequency  axis  with  the  multi- 
plication operation  along  the  lag  axis  ([33  , p.  67),  resulting  in  the 
following  expression: 


Let  us  consider  a non-stationary  real  signal  s(t)  as  defined  in 
Eq.  (1).  The  WVD  is  the  core  member  of  the  class  of  QTFDs.  The 
WVD  is  defined  by  taking  the  Fourier  transform  (FT)  of  an  instan- 
taneous auto-correlation  function  I<z(t,  r)  expressed  as  1]: 

Wz(t,f)  = Jl<z(t,T)e-2jnfTdT,  (2) 

R 

where  I<z(t,  r)  is  defined  as 

(3) 

and  where  z(t)  is  the  analytic  associate  of  a real  signal  s(t)  ob- 
tained by  the  use  of  the  Hilbert  transform;  expressed  as 


z(t)=s(t)  + j"H{s(t)}.  (4) 

In  the  above  expression,  the  imaginary  part  represents  the  Hilbert 
Transform  defined  by  H{s{t)}  = ^p.v.{/R  f^dr},  where  p.v.  repre- 
sents a principle  value  expressed  as 


1 f s 


s'r-dr  [ = lim 

T | S^O 


t—S  +00 

dr+f^ 
J t-T  J t- T 

oo  t+5 


dr 


(5) 


For  a real  AM-FM  signal  s(t)  = a(t)  cos(</>(t)),  the  analytic  signal 
z(t ) can  be  written  as:  z(t ) = a(t)e^^\  under  some  conditions 
such  as  when  a(t)  is  a low-frequency  amplitude  whose  spec- 
trum does  not  overlap  with  the  spectrum  of  the  high-frequency 
signal  [16,17].  For  such  signals  with  large  Bandwidth-Time- 
duration  (BT)  product,  then  the  IF  provides  an  estimate  of  the  FM 
law. 


2.1.1.  Direct  quadratic  time-frequency  formulation 

The  WVD  defined  in  Eq.  (2)  gives  ideal  concentration  for  mono- 
component LFM  signals,  but  it  produces  undesired  cross-terms  for 
non-linear  frequency  modulated  (FM)  or  multi-component  signals. 
Cross-terms  can  be  reduced  by  convolving  the  WVD  with  a 2D 
(t,  /)  kernel,  resulting  in  the  following  expression  [1], 


P(f,f)=  f G(t,  r)  * I(z(t,  r)  e 2j7TfTdr , (7) 

R " ' 

RzQ,t ) 

where  G(t,  r)  is  called  the  time-lag  kernel  of  the  TFD  and  is  re- 
lated to  y(t,  f)  by  inverse  FT.  The  time-lag  formulation  is  widely 
used  for  implementing  TFDs  because  of  its  conceptual  simplicity 
and  computational  efficiency  as  it  requires  only  one  convolution 
along  the  time  axis  and  one  FT  from  lag  domain  to  frequency  do- 
main [13]. 

2.1.3.  Doppler-frequency  formulation  of  QTFDs 

A Doppler-frequency  based  formulation  of  TFDs  can  be  ob- 
tained from  the  (t,  /)  formulation  by  replacing  the  time  convolu- 
tion in  Eq.  (6)  by  a multiplication  in  Doppler  after  taking  a FT  [33]: 

Pit,  f)  = f Giv,  f)*kz(v,  f)e2j7ZVtdv,  (8) 

R 

where  Q(v,  f)  is  the  kernel  in  the  Doppler-frequency  domain.  The 
quantity  kz(v,  /)  is  often  referred  to  as  the  spectral  autocorrelation 
function  (SAF)  defined  as  ([33  , p.  70): 

kz(v,/)  = z(7+0Z*(/-0,  (9) 

where  Z(/)  is  the  FT  of  z(t).  The  Doppler-frequency  TFD  formu- 
lation requires  one  convolution  along  the  frequency  axis  and  one 
inverse  FT  (from  Doppler  to  time)  to  transform  the  SAF  to  the  (t,  /) 
domain  representation.  The  computational  cost  of  transforming  the 
SAF  to  the  ( t , /)  domain  is  equal  to  that  of  transforming  the  in- 
stantaneous auto-correlation  function  to  the  (t,  /)  domain,  apart 
from  the  required  estimation  of  Z(/). 

2.1.4.  Doppler-lag  formulation  of  QTFDs 

A Doppler-lag  formulation  can  be  obtained  by  replacing  the 
convolution  operation  along  the  time  axis  in  Eq.  (7)  with  the 
equivalent  multiplication  along  the  Doppler-axis  ([33],  p.  69),  re- 
sulting in  the  following  expression: 


p(tJ)  = Y(t,f)**Wz(tJ)  (6) 

where  y(t,  f)  is  the  2D  (t,  /)  smoothing  kernel.  Eq.  (6)  represents 
the  (t,  /)  formulation  of  QTFDs.  The  2D  smoothing  of  the  WVD 
with  y(t,f)  reduces  the  cross-terms,  but  blurs  the  auto-terms. 
For  this  reason,  the  (t,  /)  kernels  are  designed  to  achieve  the  best 
tradeoff  between  the  following  two  conflicting  objectives: 


p(t,f)  = J J g(V , T)Az(V,  T)  e 2 j^/r +2  j7Ttv  d^  d v , 


Az(v,r) 


(10) 


where  Az(v,  r)  represents  the  ambiguity  function,  defined  as: 


Az(v,t)  = 


/ 


Kz(t,  r)e~2j7rtvd t, 


(H) 


a)  Minimize  cross-terms;  and  g(v,  t)  = /RG(t,  r)e  2jjrtl,dt  is  the  Doppler-lag  kernel.  This 

b)  Retain  the  resolution  of  auto-terms.  formulation  is  often  used  for  designing  high-resolution  TFDs 
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as  it  allows  entering  filter  specifications  in  the  formulation  of 
g(v,  r)  [33].  The  Doppler-lag  formulation  requires  three  FTs  as 
computing  the  ambiguity  function  from  the  time-lag  kernel  needs 
one  FT  and  transforming  the  ambiguity  function  to  the  TFD  re- 
quires two  FTs. 

These  four  formulations  are  illustrated  in  Fig.  3. 

The  selection  of  one  of  the  above  four  formulations  depends 
on  the  specifications  of  the  application  and  signal  characteristics 
such  as  duration  T,  bandwidth  B,  and  value  of  BT  product.  For 
example,  for  narrow-band  signals  (i.e.  with  small  B ),  Eq.  (8)  would 
be  preferable  to  use  ([33],  p.  70). 


2.1.5.  Which  TFD  to  use?  Spectrogram  as  a non-separable  QTFD , WVD, 
S-method  or  another  TFD? 

A question  asked  by  most  beginners  is  which  TFD  to  use  to  get 
started,  given  that  the  WVD  has  a good  resolution  with  a prob- 
lem of  cross-terms,  the  Spectrogram  has  a problem  of  resolution, 
but  no  apparent  cross-terms  and  other  QTFDs  appear  more  compli- 
cated. To  answer  this  question,  let  us  first  review  the  Spectrogram 
in  detail. 

The  Spectrogram  is  a simple  yet  effective  TFD  in  some  situa- 
tions. It  is  defined  as: 


|/ s(r)w(t-r)e  2j7r^Tdr 


(12) 


where  w(t)  is  the  analysis  window.  The  Spectrogram  is  a QTFD 
as  its  time-lag  kernel  can  be  expressed  as  the  instantaneous  auto- 
correlation function  of  the  window  ([33],  pp.  47-48). 

C(t,r)  = w*^t+^jw|t-0  (13) 

This  implies  that  the  (t,  /)  kernel  of  the  Spectrogram  y(t,  f ) is  the 
WVD  of  the  window  w(t),  and  equivalently  the  ambiguity  domain 
kernel  filter  g{v , r)  is  the  ambiguity  function  of  the  window  w(t) 
([33],  p.  76), 

K(t,/)  = J w*ft  + 0w(f_  ^)e~2]7TfTdr.  (14) 

R 
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Fig.  2.  The  relationships  between  equivalent  formulations  of  separable  (t,  /)  kernels 
in  several  dual  domains. 

SMs(t,  /)  = 2 j G(9)Fs(t,  f + 9)F*(t,  f - 9)d9,  (16) 

R 

where  Fs(t,  f)  is  the  short-time  Fourier  transform  of  s(t),  defined 
as: 


Fs(t , 


w(r)s(t  + r)e  j27r^Tdr. 


(17) 


In  Eq.  (16),  G(0)  is  a narrow  window  whose  length  controls  the 
cross-term  suppression  and  auto-term  resolution  properties  of  the 
TFD.  Previous  studies  have  shown  that  an  appropriate  selection  of 
the  length  of  G(6 ) can  combine  advantages  of  both  Spectrogram 
and  WVD.  The  resultant  TFD  can  have  auto-term  resolution  close 
to  the  WVD  with  a significant  suppression  of  cross-terms  [35].  The 
ambiguity  domain  kernel  of  the  S-method  is  given  below  13]. 


g(y,  r)  = G(v)* 


/"HM-jk' 


j2nuvdu 


= G(v)*  Aw(y,  t). 


(18) 


Eq.  (16)  shows  that  the  S-method  is  computed  using  the  short- 
time  Fourier  transform,  which  makes  it  more  computationally  ef- 
ficient than  other  QTFDs  such  as  the  WVD.  Let  us  now  review  the 
design  of  separable  kernel  TFDs. 


To  discuss  the  limitations  of  the  smoothing  kernel  of  the  Spec- 
trogram, let’s  consider  a Gaussian  window  w(t)  = (a/7r)1/4e_2t  . 
Using  Eq.  (14)  results  in  the  (t,  /)  kernel  for  this  window  ex- 
pressed as  34]: 

ni2  47T2/2 

Y(t,  /)  = 2e  at  e « . (15) 

The  above  expression  shows  that  the  smoothing  kernel  of  the 
Spectrogram  is  non-separable  given  that  both  t and  / variables 
are  parametrized  by  the  same  scale  value.  The  smoothing  along 
the  time  axis  cannot  be  controlled  independently  from  the  one 
in  the  frequency  axis  and  vice-versa;  this  makes  the  Spectrogram 
extremely  sensitive  to  the  length  of  the  window  ([33],  p.  221).  At- 
tempts to  mitigate  this  limitation  of  the  Spectrogram  include  using 
separable  kernel  TFDs  that  have  the  flexibility  to  independently  ad- 
just the  type  of  smoothing  along  the  time  or  frequency  axis  as 
discussed  in  Section  2.2. 

In  short,  the  Spectrogram  is  an  easy  to  use  QTFD  that  does  not 
have  a problem  with  cross-terms,  but  suffers  from  poor  resolution 
as  well  as  sensitivity  to  the  choice  of  window  length.  On  the  other 
hand,  the  WVD  has  higher  resolution  for  mono-component  LFM 
signals,  but  has  an  issue  of  cross-terms  that  may  be  problematic 
for  analysis  and  modeling. 

The  S-method,  which  can  be  considered  an  optimized  version 
of  the  Spectrogram,  combines  the  advantages  of  the  Spectrogram 
and  WVD.  It  is  defined  as  [35]: 


2.2.  High-resolution  TFDs  design  based  on  separable  kernels 

2.2.1.  Definition  of  separable  kernel  TFDs 

An  attempt  at  simplification  is  to  first  consider  (t,  /)  kernels 
that  can  be  represented  simply  as  the  product  of  an  indepen- 
dent time-only  kernel  with  an  independent  frequency-only  kernel; 
these  separable  (t,  /)  kernels  ([33],  pp.  213-222)  are  expressed  as: 

y(t,f)=gi(t)G2(f).  (19) 

The  meaning  and  significance  of  this  simplification  is  that  we  can 
define  and  design  some  QTFDs  simply  by  smoothing  the  WVD  in 
t and  then  in  /.  The  shape  and  size  of  gi(t)  or  G2(f)  deter- 
mines the  smoothing  along  the  time  or  frequency  axis  respectively. 
In  many  applications,  separable  (t,  /)  kernels  have  shown  to  give 
good  enough  results  in  terms  of  cross-term  suppression  and  auto- 
term resolution,  making  them  popular  as  they  are  easy  to  use. 
Other  equivalent  formulations  of  ( t , f)  separable  kernels  are  given 
below  in  other  related  domains. 

• Time-lag  domain:  G(t,  r)  = gi(t)g2(T) 

• Doppler-frequency  domain:  Q(v,  f ) = G^(v)G2(f) 

• Doppler-lag  domain:  g{v , r)  = G\(v)g2(z) 

Fig.  2 illustrates  the  relationships  between  the  above  formulations 
of  the  (t,  /)  kernels. 
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— G(t,r)  * Kz(t,r ) = &(v,f)  * kfy,f) 


A(v,  r)  = g(v , r)  A(v,  r) 

Fig.  3.  The  relationships  between  equivalent  formulations  of  TFD  in  several  dual 
domains. 

As  discussed  above,  high-resolution  separable  kernel  TFDs  can 
be  designed  in  the  four  possible  domains,  that  is:  (t,  /)  domain, 
Doppler-frequency  domain,  time-lag  domain  or  Doppler-lag  do- 
main; they  are  related  to  each  other  by  FT  as  illustrated  in  Fig.  3. 
The  ambiguity  domain  (lag-Doppler  domain)  is  usually  preferred 
for  designing  TFDs  as  filtering  in  the  ambiguity  domain  is  per- 
formed by  a multiplication  of  lag  and  Doppler  windows  instead 
of  one  or  two  convolutions  in  other  domains. 

2.2.2.  Cross-terms  in  the  WVD  and  ambiguity  domain 

The  WVD  defined  in  Eq.  (2)  is  often  considered  as  the  core  dis- 
tribution of  QTFDs;  it  suffers  from  the  presence  of  undesired  arti- 
facts or  cross-terms  that  prevent  its  straightforward  interpretation 
as  an  energy  distribution  versus  t and  /.  These  cross-terms  can 
be  classified  into  two  types:  outer- terms  and  inner- terms.  Outer- 
terms  can  be  easily  explained  by  considering  a two  component 
signal:  z(t)  = z^(t)  + Z2(0  for  which  the  WVD  is  given  by 

Wz(t,  f ) = WZ|  (t,  f ) + WZ2(t,  f ) 

+ 2Re  j J Z\  + Cjz*  _ ^je~2]7Z  fz dr  J . (20) 

R 

WZlz2(t,f) 

In  the  above  expression,  WZl(t,  /)  and  WZ2(t,f ) represent  auto- 
terms that  describe  an  energy  concentration  while  WZlZ2(t,  /)  rep- 
resents outer-term  artifacts  that  are  generated  by  the  interaction  of 
signal  components  zi(t)  and  Z2(t).  The  second  type  of  artifacts  is 
called  inner-terms;  they  are  generated  by  the  non-linear  FM  char- 
acteristic of  mono-component  signals. 

A non-linear  mono-component  signal  can  be  approximated  by 
the  summation  of  a number  of  piece-wise  LFM  signal  components. 
For  example,  a single  non-linear  FM  signal  component  z\  (t)  can  be 
written  as: 

Zl(t)=Zl,a(0  + Zl,b(t),  (21) 

where  zi>a(t)  and  zi^t)  represent  respectively  the  first  and  the 
second  pieces  constituting  the  signal  zi(t).  Its  WVD  is  given  by: 


inner-terms  is  partly  orthogonal  to  the  IF  of  a non-linear  FM  signal 
([33],  p.  63). 

The  ambiguity  function  is  often  used  for  designing  QTFDs  be- 
cause it  is  the  2D-FT  of  the  WVD,  which  then  allows  filtering  in  the 
(t,  /)  domain  by  a multiplication  in  the  ambiguity  domain.  Specif- 
ically, Eq.  (6)  and  Eq.  (10)  show  that  TFDs  can  be  expressed  as  the 
result  of  a double  convolution  of  (t,  f ) kernels  with  the  WVD  in 
the  (t,  /)  domain  or  as  the  2D-FT  of  the  product  of  the  Doppler- 
lag  kernel  with  the  ambiguity  function.  This  leads  to  the  following 
simple  procedure  for  designing  the  QTFDs. 

In  the  ambiguity  domain,  the  cross-terms  seem  to  appear  away 
from  the  origin  (see  Fig.  4(b))  and  are  separated  by  a distance  that 
is  approximately  equal  to  the  frequency  spacing  in  between  two 
signal  components  as  illustrated  by  Fig.  4(b).  On  the  other  hand, 
the  auto-terms  pass  through  the  origin  and  appear  between  two 
cross-terms  in  the  ambiguity  domain.  We  note  that  the  energies 
of  the  auto-terms  are  mostly  concentrated  around  the  origin  as 
illustrated  in  Fig.  4(b). 

QTFDs  can  then  be  defined  to  exploit  the  geometrical  proper- 
ties of  auto-terms  and  cross-terms  in  the  ambiguity  domain  by 
designing  kernels  using  the  “almost  low-pass”  characteristics  of 
auto-terms  and  the  high-pass  characteristics  of  cross-terms  in  the 
ambiguity  domain,  resulting  in  kernels  that  discriminate  and  sepa- 
rate them.  The  cross-terms  can  therefore  be  reduced  by  applying  a 
suitable  2D  filter  in  the  ambiguity  domain  and  then  transforming 
the  ambiguity  function  back  to  the  (t,  /)  domain. 

The  exact  specifications  of  a 2D  filter  depend  on  the  nature  of 
the  signal  being  analyzed.  For  example,  for  Fig.  4(b),  a 2D  filter  that 
is  elongated  along  the  lag  axis  with  low-pass  characteristics  along 
the  Doppler  axis  will  lead  to  a higher-resolution  TFD.  The  result  is 
shown  in  Fig.  4(c);  and  for  illustration  of  this  principle,  the  next 
subsection  presents  simple  examples  of  relevant  separable  kernels 
that  lead  to  high-resolution  QTFDs. 

2.2.3.  Principles  of  separable  kernel  TFDs  design 

To  illustrate  this  principle  in  clear  terms,  let  us  consider  two 
special  cases:  lag-independent  TFDs  and  Doppler-independent 
TFDs.  These  two  cases  of  separable  kernel  TFDs  can  be  used  in 
certain  specific  situations  related  to  signal  characteristics,  resulting 
in  easy  designs  and  implementations. 

a)  Lag-independent  (LI)  TFDs:  Such  lag-independent  TFDs  only 
perform  smoothing  along  the  time  axis  and  are  therefore  char- 
acterized by  G2(/)  = 8(f)  or  g2(t)  = 1.  This  kind  of  kernel 
is  suitable  for  signals  whose  IF  is  parallel  to  the  time  axis  in 
the  ( t , /)  domain.  In  the  ambiguity  domain,  such  signals  have 
most  of  their  auto-term  energy  concentrated  along  the  lag  axis 
as  these  signals  appear  as  an  impulse  along  the  frequency  axis 
in  the  (t,  /)  domain,  and  as  a constant  along  the  lag  axis  in 
the  ambiguity  domain.  The  MBD  kernel  represents  such  an  ex- 
ample of  LI-TFD.  It  is  defined  in  the  ambiguity  domain  by  the 
following  kernel  [1]: 


WZl(t,/)  = WZl  a(t,/)  + WZu(t,  /) 

+ 2Re  j J Z\,a(t+  0z*  b(t-  Cje~2j7TfTdx  . 

R 


(22) 

Cross-terms  are  oscillatory  in  nature,  with  the  same  order  of  mag- 
nitude as  that  of  auto-terms  [36  , and  they  lie  half-way  in-between 
two  signal  components  in  (t,  /)  plane  as  illustrated  in  Fig.  4(a). 
The  direction  of  oscillation  of  outer-terms  is  orthogonal  to  the  line 
joining  the  two  components  while  the  direction  of  oscillation  of 


g(v,r)  = Gi(v)  = 


ITQ8  + j7rv)|2 
r2  OS) 


|V|  < T,  0</S<l 

(23) 


where  v and  p are  bounded  to  ensure  that  the  MBD  is  a low- 
pass  filter.  Some  physiological  signals  like  HRV  and  EEG  seizure 
have  negligible  FM;  the  direction  of  oscillation  of  outer- terms 
for  such  signals  is  parallel  to  the  time  axis.  The  MBD,  a lag- 
independent  distribution,  is  naturally  suited  for  the  analysis  of 
such  signals  as  it  only  performs  smoothing  along  the  time  axis, 
thus  avoiding  the  blurring  of  auto-terms  in  all  directions  usu- 
ally caused  by  a 2D  smoothing.  To  illustrate,  let  us  consider  a 
multi-component  signal  composed  of  two  tones  and  one  LFM, 
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Fig.  4.  Illustration  of  inner-term  and  outer-term  artifacts  using  the  sum  of  a quadratic  FM  signal  and  a linear  FM  signal,  (a)  (t,  /)  domain  representation  using  the  WVD. 
(b)  Ambiguity  domain  representation,  (c)  ( t , f ) domain  representation  using  the  extended  modified  B distribution  (EMBD). 


and  analyze  it  using  the  MBD,  windowed  WVD  (discussed  in 
the  next  subsection),  and  Hamming-Hanning  kernel  TFD  [32] 
as  shown  in  Fig.  5.  The  MBD,  in  Fig.  5(a),  gives  high  energy 
concentration  for  the  two  tones  but  fails  to  give  the  same  level 
of  energy  concentration  for  the  LFM  component  as  it  performs 
smoothing  only  along  the  time  axis.  The  Hamming-Hanning 
kernel  TFD,  shown  in  Fig.  5(b),  gives  a blurred  representa- 
tion for  all  the  signal  components  due  to  its  2D  smoothing 
kernel.  The  windowed  WVD  in  Fig.  5(c)  fails  to  suppress  the 
cross-terms  due  to  its  frequency-only  smoothing  that  is  al- 
most orthogonal  to  the  direction  of  oscillation  of  cross-terms. 
Of  course,  for  all  the  TFDs,  further  improvements  can  be  made 
by  optimizing  window  size  and  window  shape  following  tra- 
ditional DSP  filter  design  criteria, 
b)  Doppler-independent  (DI)  TFDs:  Such  Doppler-independent 
TFDs  only  perform  smoothing  along  the  frequency  axis  and 
are  characterized  by  gi(t)  =S(t)  or  Gi(v)  = 1.  These  DI  types 
of  kernels  are  appropriate  for  signals  whose  auto-terms  are 
parallel  to  the  frequency  axis  in  the  (t,  /)  domain.  In  the 
ambiguity  domain,  such  signals  have  most  of  their  auto-term 
energy  concentrated  along  the  Doppler  axis.  A simple  example 
of  a DI-TFD  is  the  WVD  windowed  with  a Hamming  window 
along  the  lag  axis  expressed  as: 

g(v,  r)  = g2(x ) = 0.54  - 0.46cos(27TT),  -0.5  < r < 0.5, 

(24) 


where  r is  bounded  to  ensure  that  Hamming  is  a low-pass 
filter. 

The  DI-TFD  is  suited  to  real-world  signals  that  have  the  char- 
acteristics described  above,  like  for  instance  EEG  spike  signals 
which  can  be  modeled  by  a train  of  impulses  in  the  time- 
domain.  The  direction  of  oscillation  of  outer-terms  for  such 
signals  is  parallel  to  the  frequency  axis.  Methods,  such  as  the 
windowed  WVD  defined  by  Eq.  (24),  are  more  suitable  for 
the  analysis  of  this  type  of  signals  as  they  perform  smoothing 
along  the  frequency  axis  only  and  avoid  unnecessary  blurring 
caused  by  a complete  2D  filter.  Fig.  6 shows  the  MBD,  win- 
dowed WVD,  and  Hamming-Hanning  kernel  TFD  of  a signal 
composed  of  two  impulses  in  the  time  domain  and  an  LFM 
signal.  The  MBD,  shown  in  Fig.  6(a),  gives  poor  energy  con- 
centration for  all  the  signal  components  as  the  direction  of  its 
smoothing  kernel  is  orthogonal  to  the  direction  of  energy  con- 
centration of  the  signal  components.  The  Hamming-Hanning 
kernel  TFD,  shown  in  Fig.  6(b),  performs  2D  smoothing  and 
therefore  blurs  the  signal  auto-terms,  but  its  resolution  is  bet- 
ter than  the  MBD.  The  windowed  WVD,  shown  in  Fig.  6(c), 
results  in  a high-resolution  (t,  /)  representation  for  the  two 
impulses  but  it  fails  to  give  same  resolution  for  the  LFM  com- 
ponent as  it  only  performs  smoothing  along  the  frequency 
axis. 

Apart  from  the  LI  and  DI  TFDs,  most  of  the  other  separable 
kernel  TFDs,  like  the  B-distribution,  the  extended  Modified  B- 
distribution  and  the  Compact  support  kernel  employ  smooth- 
ing filters  along  both  time  and  frequency  axes  [33,32,10].  Some 
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Fig.  5.  (t,  /)  representations  of  a signal  composed  of  two  tones  and  one  LFM.  (a)  The  MBD  (ft  = 0.05).  (b)  The  Hamming-Hanning  kernel  TFD  (window  length  = 32  s for 
both  Doppler  and  lag  axes),  (c)  The  windowed  WVD  (rectangular  window  length  = 31  s). 


Fig.  6.  TFD  representation  of  a multi-component  signal  composed  of  two  impulses  and  one  LFM  in  the  ( t , f ) domain  using  the  (a)  MBD  (ft  = 0.5),  (b)  Hamming-Hanning- 
kernel  TFD  (window  length  = 32  s for  both  Doppler  and  lag  axes),  (c)  windowed  WVD  (rectangular  window  length  = 20  s). 


examples  of  these  particular  kernels  previously  used  in  the  lit- 
erature or  recently  developed  are  given  below. 

2.2.4.  Specific  separable  kernel  TFDs 
i.  B-distribution  Kernel  ([33],  p.  217):  The  B-distribution  kernel 
is  defined  in  the  ambiguity  domain  as  follows: 


g(V,T)  = g2(T)C1(v)  = 


, .p \r(j[+prv)f 

21-^r(2J8)  ’ 


|v|  < 0.5,  |r|  < 0.5,  and  0<  ft  <\, 


(25) 


where  r,  v and  /3  are  bounded  to  ensure  that  the  B- 
distribution  contains  a low-pass  filter. 
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Fig.  7.  (a)  The  EMBD  kernel  with  dominant  lag-direction,  (b)  The  EMBD  with  dominant  Doppler-direction. 


The  B-distribution  kernel  is  the  product  of  a low-pass  filter 
Gi(v)  = and  a high-pass  filter  g2(r)  = \t\P.  The 

B-distribution  gives  a high-resolution  TFD  for  selected  small 
values  of  p [37].  A disadvantage  is  that  a zero  at  the  origin 
appears  as  a consequence  of  the  high  pass  filtering  performed 
along  the  lag  axis  i.e.  g2(0)  = |0|^  = 0.  This  results  in  a deteri- 
oration of  the  resolution  of  the  B-distribution  for  certain  types 
of  signals  [32].  This  fact  motivated  its  modification  as  detailed 
below. 

ii.  Modified  B-distribution:  As  explained  above,  the  MBD-kernel 
was  designed  to  overcome  some  of  the  problems  of  the  B- 
distribution,  based  on  the  following  observations 

1 The  B-distribution  usually  gives  high  resolution  (t,  f ) rep- 
resentation for  very  small  values  of  p (0  < p <$;  1)  ([33], 

p.  118). 

2 For  small  values  of  p,  the  lag  window  is  approximately  an 
all  pass  filter  (except  for  r = 0 ),  i.e.,  g2(t)  ~ 1. 

The  kernel  of  the  MBD  includes  only  the  Doppler  window  of 
the  B-distribution  kernels  so  as  to  avoid  the  distortion  caused 
by  high  pass  filtering  along  the  lag-axis  as  shown  in  Eq.  (25). 
The  consequence  is  that  the  MBD  gives  the  highest  energy 
concentration  only  for  signals  whose  IF  does  not  vary  rapidly 
with  time,  such  as  EEG  seizure  signals  [38]. 

iii.  Hamming-Hanning  TFD:  The  corresponding  Hamming-Hanning 
kernel  is  a 2D  low-pass  filter  expressed  as  [32]: 


g(v,  r)  = (0.54  - 0.46  cos(27rr))  (0.5  - 0.5cos(7rv)), 

1111 

--  < V < -,  --  < T < — . 


(26) 


This  kernel  results  in  a high-resolution  TFD  obtained  by  choos- 
ing a 2D  low-pass  filter  along  both  lag  and  Doppler  axes.  This 
example  is  provided  merely  to  show  that  any  of  the  classical 
windows  used  for  spectral  analysis  or  digital  filter  design  can 
be  used  as  kernel  filters,  with  specifications  provided  in  the 
ambiguity  domain. 

iv.  Extended  modified  B-distribution:  As  discussed  earlier,  pre- 
vious studies  have  reported  that  the  kernel  of  the  lag- 
independent  MBD  mentioned  in  Eq.  (25)  gives  optimal  energy 
concentration  for  signals  with  negligible  FM  ([33],  pp.  213- 
222),  i.e.,  for  signals  whose  IF  is  almost  parallel  to  the  time 
axis.  The  drawback  of  the  lag-independent  formulation  of  the 
MBD  is  that  it  does  not  allow  smoothing  along  the  frequency 
axis,  thus  limiting  its  scope  for  the  analysis  of  signals  whose 


auto-terms  are  parallel  to  the  time  axis.  For  example  the  MBD 
cannot  be  used  for  the  analysis  of  a signal  that  is  composed 
of  a train  of  impulses  as  observed  in  some  newborn  EEG  sig- 
nals, as  it  then  would  give  poor  results  [1].  The  EMBD  exhibits 
improvement  in  some  situations  as  it  extends  the  MBD  by 
applying  its  kernel  filter  along  both  lag  and  Doppler  axes,  re- 
sulting in  the  expression  32]: 


g(v,  T)  = 


\r(P  + jnv)\2  \T(a  + jnx)\2 


r2(p) 


r2(a) 


(27) 


where  -0.5  < v < 0.5,  - 0.5  < r < 0.5,  0 < p < 1 and  0 < 
a < 1.  The  lengths  of  the  Doppler  and  lag  windows  are  con- 
trolled by  separate  parameters  a and  p respectively.  The  extra 
degree  of  freedom  in  the  formulation  of  the  EMBD  allows  to 
independently  adjust  the  lengths  of  the  windows  along  both 
lag  and  Doppler  axes  as  illustrated  in  Fig.  7.  This  makes  it  a 
more  useful  tool  for  the  analysis  of  real-life  signals,  such  as  fe- 
tal movement  signals,  EEG  signals  with  seizure  and  EEG  spike 
signals.  However,  the  EMBD  has  only  one  parameter  for  each 
kernel  filter  or  window  (that  is  a for  lag  and  p for  Doppler) 
to  control  both  shape  and  size.  Thus  the  EMBD  does  not  allow 
adapting  both  length  and  shape  of  the  smoothing  window  of 
the  kernel  filter  independently,  although  it  is  an  improvement 
on  both  the  BD  and  the  MBD. 

v.  Compact  support  kernel  TFD:  Such  compact  support  kernels 
(CSK)  are  designed  to  vanish  outside  a given  range  in  the  am- 
biguity domain.  Unlike  Gaussian  windows  they  do  not  have 
infinite  length,  so  there  is  no  need  to  truncate  those  using 
rectangular  windows  that  may  cause  loss  of  information.  These 
TFDs  have  been  shown  to  outperform  other  kernel-based 
methods  in  terms  of  their  ability  to  suppress  cross-terms  while 
retaining  the  resolution  of  auto-terms  in  some  cases  [10].  Such 
high-resolution  performance  is  achieved  by  these  kernels  by 
combining  their  compact  support  with  a flexibility  to  adjust 
both  shape  and  size  independently,  as  discussed  below. 

In  order  to  explain  the  characteristics  of  the  CSK,  let  us  first 
observe  in  the  formulation  below  that  its  kernel  includes  two 
components  [10]: 


g(v,  T)  = 


IcD2  . cD2 
(p-ce  v2— D2  t2-D2  ^ 

o, 


I v2<D2 
1 r 2 <D2 
otherwise 


(28) 


In  the  above,  the  two  branches  formed  by  the  Doppler  window 
and  lag  window  of  the  CSK  are  respectively  given  by 
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Fig.  8.  (a)  The  CSK  as  a Quasi-Rectangular  window,  (b)  The  CSK  as  a quasi-Gaussian  window. 


Fig.  9.  (a)  The  Spectrogram  (Bartlet  window  length  = 55  s).  (b)  Hamming-Hanning  kernel  TFD  (lag  window  length  = 128  s,  Doppler  window  length  = 0.07  Hz). 


f cD2 

Gi(v)  = 1 ecev2~D2 , 

\v\<D 

1 o. 

otherwise 

( cD 2 

gi(  T)=\ece*2-D2’ 

|T|  <D 

|o. 

otherwise 

The  above  equations  show  that  the  shape  and  size  of  both 
Doppler  and  lag  windows  are  determined  respectively  by  the 
parameters  c and  D.  These  parameters  allow  the  CSK  to  adapt 
both  kernel  length  and  shape.  For  example,  the  shape  of  the 
CSK  in  the  ambiguity  domain  can  be  adjusted  from  the  quasi- 
rectangular  window  to  the  quasi-Gaussian  window  as  shown 
in  Fig.  8.  However,  the  CSK  definition  restricts  the  Doppler  and 
lag  windows  to  be  of  same  length,  so  that  the  smoothing  along 
time  or  frequency  axis  cannot  be  adjusted  independently.  This 
limitation  is  overcome  by  a new  design  procedure  described 
in  Section  2.3. 

2.2.5.  Performance  comparison:  separable  kernel  TFD  vs  spectrogram 
To  illustrate  the  potential  advantage  of  using  separable  kernel 
TFDs  over  the  Spectrogram  in  terms  of  energy  concentration  and 
resolution  properties,  let  us  consider  a two  component  signal  de- 
fined as: 


s(t)  = 


cos  (2 n (0.2 1 - 0.0006t2))  + cos  (27T  (0.3t  + 0.0006t2)), 
0 < t < 127, 

0,  otherwise, 


(30) 


The  Spectrogram  and  Hamming-Hanning  kernel  based  TFD  for  this 
signal  are  shown  in  Fig.  9.  The  parameters  of  both  TFDs  were  op- 
timized based  on  visual  analysis. 

Fig.  9 shows  that  the  separable  kernel  TFD  gives  high  energy 
concentration  for  the  given  signal  as  compared  to  the  Spectrogram. 
This  improved  performance  of  the  Hamming-Hanning  kernel  based 
TFD  is  due  to  additional  flexibility  in  its  formulation  to  indepen- 
dently adjust  smoothing  along  time  and  frequency  axes. 

2.3.  An  advanced  TFD  design  with  improved  resolution:  the  CI<D 

2.3.1.  Non-uniform  compact  support  kernel 

Eq.  (29)  indicates  that  neither  the  shape  nor  the  size  of  the 
Doppler  and  lag  windows  can  be  adjusted  independently  of  each 
other.  This  shortcoming  limits  the  application  of  the  CSK  TFD  to 
the  analysis  of  signals  whose  energy  is  homogeneously  distributed 
in  the  ambiguity  domain.  As  a consequence,  the  CSK  TFD  cannot 
deal  optimally  with  signals  like  EEG  seizure  signals,  whose  IF  is 
almost  parallel  to  the  time  axis,  or  EEG  spike  signals,  whose  IF  is 
almost  parallel  to  the  frequency  axis. 

To  illustrate  the  limitations  of  the  CSK  TFD,  let  us  consider  a 
multi-component  signal  composed  of  two  LFM  signals.  Fig.  10(a) 
shows  the  ambiguity  function  modulus  of  this  signal  and  a CSK 
with  optimal  parameters.  The  CSK,  even  with  optimal  parameters, 
fails  to  retain  all  the  auto-term  energy  in  the  ambiguity  domain, 
thus  resulting  in  a blurred  TFD  as  shown  in  Fig.  10(b).  Fig.  10(c)  il- 
lustrates the  ambiguity  domain  filtered  by  a separable  kernel  that 
is  elongated  along  the  lag  axis  to  retain  all  auto-term  energy.  The 
separable  kernel  filters  cross-terms  while  retaining  all  the  auto- 
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Fig.  10.  Ambiguity  functions  and  TFDs  of  two  close  tones  using  a CSK  TFD  and  a separable  kernel  TFD.  (a)  The  optimized  CSK  in  ambiguity  domain,  (b)  CSK-TFD  with 
optimized  parameters,  (c)  Ambiguity  domain  representation  of  the  optimal  kernel  TFD.  (d)  TFD  representation  after  the  application  of  the  optimal  kernel. 


terms,  resulting  in  a higher-resolution  TFD  as  shown  in  Fig.  10(d). 
The  low-resolution  performance  of  the  CSK  is  due  to  the  restric- 
tion in  its  formulation  to  have  equal  length  for  both  Doppler  and 
lag  windows. 

2.3.2.  Derivation  of  the  extended  compact  support  kernel 

The  CSK  can  be  extended  by  simply  modifying  the  formulation 
of  Doppler  and  lag  windows  so  that  their  lengths  can  be  adjusted 
independently.  Such  modified  Doppler  and  lag  windows  can  then 
be  expressed  as: 


f CD2 

Gi  (y)  = | ecev2-Dl , 

|v|  < D 

l o. 

otherwise 

f CE2 

gi(  r)  = hce^2, 

|r|<£, 

o, 

otherwise. 

and 


(31) 


As  mentioned  above,  the  lengths  of  the  modified  Doppler  and  lag 
windows  now  depend  on  two  parameters  D and  E.  These  param- 
eters are  selected  on  the  basis  of  shape  and  orientation  of  auto- 
terms in  the  ambiguity  domain  so  that  the  maximum  energy  of 
auto-terms  is  retained  and  cross-terms  are  reduced.  This  leads  to 
the  following  formulation  of  the  Extended  Compact  Kernel  (ECK): 


g(V,T)  = G-t(v)g2(T) 

i cD2  I 

J v2_q2 

0, 


r 2— E2 


| v | < D,  |r|  < E, 
otherwise. 


(32) 


The  additional  degree  of  freedom  in  the  formulation  of  the  ECK  to 
independently  adjust  the  lengths  of  the  Doppler  and  lag  windows 
is  illustrated  in  Fig.  11. 

The  resulting  ECK  performs  smoothing  along  both  time  and  fre- 
quency axes.  Note  that,  like  many  other  TFDs,  it  does  not  rigorously 
fulfill  mathematical  properties  like  time  support,  frequency  sup- 
port, time  and  frequency  marginals,  and  positivity  ([33],  p.  216). 
However,  it  satisfies  the  mathematical  properties  of  realness,  en- 
ergy conservation  and  (t,  /)  shift  invariance  [10],  which  are  suffi- 
cient and  convenient  for  a number  of  applications  such  as  classifi- 
cation, as  justified  below. 

1.  Real  TFDs  decrease  the  dimensionality  of  the  classification 
problem  as  compared  to  complex  TFDs. 

2.  Time-shift  and  frequency-shift  invariance  are  needed  to  ensure 
the  consistent  performance  of  pattern  classification  algorithm. 

3.  The  conservation  of  the  total  energy  is  necessary  to  preserve 
the  relative  energy  between  features  of  different  classes  of 
data. 

The  TFD  defined  by  the  ECK  kernel  may  be  referred  to  as  ECK  TFD, 
but  this  abbreviation  is  further  shortened  to  Compact  Kernel  Dis- 
tribution (CKD)  for  simplicity. 

2.3.3.  Discrete-time  implementation  of  the  CKD 

The  time-lag  domain  is  often  used  for  the  discrete-time  imple- 
mentation of  TFDs  for  computational  efficiency  ([33],  pp.  268-278). 
The  discrete-time  time-lag  formulation  of  the  TFD  of  the  analytic 
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(a)  (b) 

Fig.  11.  (a)  The  ECK:  Quasi-similar  to  the  rectangular  window  with  a dominant  Doppler  direction,  (b)  The  ECK:  Quasi-similar  to  the  Gaussian  window  with  a dominant  lag 
direction.  (The  dominant  direction  is  shown  by  the  red  arrow).  (For  interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is  referred  to  the  web  version  of 
this  article.) 


associate  z[n]  of  the  N-point  real  signal  s[n]  can  be  expressed  as  a)  A synthetic  multi-component  signal  composed  of  two  LFM 
([33],  pp.  237):  components  defined  as: 


p[n,  k]  = 2 DFT(G[n,  m]  *(z[n  + m]z*[n  - m ])).  (33) 

m^ir  n v n 

The  discrete-time  variable  n and  discrete-frequency  variable  /<  are 
related  to  the  continuous-time  and  continuous-frequency  variables 
by  the  following  expressions  ([33],  pp.  232-241) 

n = tfs  and  k = f^~,  (34) 

Is 

where  fs  is  the  sampling  frequency.  The  resulting  discrete  TFD  is  a 
matrix  of  size  NxM,  where  N is  the  number  of  time-domain  sig- 
nal samples,  and  M is  the  number  of  frequency-domain  samples. 
Following  Eq.  (32),  the  discrete  time-lag  formulation  of  the  ECK  is 
expressed  as: 


cD 2 cE 2 

G[n,  m]  = DFT(e  (w)2-d2  )e2ce  ^)2~e2  . (35) 

Its  properties  are  discussed  in  the  next  section  with  respect  to  per- 
formance. 


2.3.4.  Performance  evaluation  of  the  CKD 
Let  us  first  consider  an  LFM  signal  as: 

s(t)  = 4 rectj^J  Cos^2jr^/Ot+  |t2^  +V^,  (36) 

where  A is  the  amplitude  (assumed  to  be  equal  to  1),  T is  the  total 
duration  and  represents  a phase  offset  {if  is  often  equal  to  0), 
and  rect[y]  represents  a rectangular  window  defined  by: 


rect[r]  = 


1 

0, 


_1  < r < 1 
2 - Z - 2 

otherwise. 


(37) 


The  frequency  rate  a is  estimated  by  the  difference  between  the 
maximal  frequency  /max  and  the  initial  frequency  /o,  divided  by 
the  total  duration  T : 


The  sampling  frequency  is  1 Hz,  the  signal  duration  is  equal 
to  T = 255  s and  the  frequency  rate  a for  both  signals  is  the 
same  and  is  given  by:  a = /max'1r~/o'1  = (02335-01)  = 0.0006, 
where  /max,  1 and  /o,i  are  respectively  the  maximal  frequency 
and  the  initial  frequency  of  the  first  LFM  signal. 

b)  An  EEG  seizure  signal  of  8-s  duration  sampled  at  32  Hz. 

c)  A fetal  movement  signal  of  2.5-s  duration  sampled  at  100  Hz. 

These  signals  are  analyzed  using  the  TFDs  defined  in  previous  sec- 
tions, including  the  EMBD,  S-method,  CSK-TFD,  MBD,  WVD,  and 
CKD.  The  results  are  shown  in  Figs.  12,  13  and  14.  The  parameters 
of  these  TFDs  for  the  synthetic  signals  are  optimized  with  respect 
to  Boashash-Sucic’s  criterion  [38]  while  for  the  real-life  signal, 
their  parameters  are  optimized  on  the  basis  of  visual  inspection 
(see  Appendix  A for  the  definition  of  this  resolution  measure). 

The  resulting  ‘P’  values  for  the  TFDs  of  the  synthetic  signal  are 
shown  in  Table  1.  On  the  basis  of  both  visual  and  quantitative  anal- 
ysis, the  CKD  outperforms  all  the  other  TFDs  for  all  these  signals 
due  to  a better  ability  to  reduce  cross-terms  while  maintaining  a 
high  resolution  of  auto-terms.  The  S-method  is  the  second  best 
performing  TFD  for  synthetic,  real-life  EEG  and  fetal  movement  sig- 
nals. Note  that  the  performance  of  the  MBD  can  be  significantly 
improved  by  using  a rectangular  window  of  length  smaller  than 
the  signal  length,  but  such  modification  will  make  the  MBD  a lag 
dependent  distribution,  i.e.,  an  EMBD. 

The  CKD  gives  the  best  performance  for  signals  whose  auto- 
terms are  nearly  parallel  to  either  time  or  frequency  axis.  However, 
its  performance  degrades  for  signals  whose  auto-terms  have  a spe- 
cific direction  away  from  the  time  axis  or  frequency  axis  in  the 
(t,  /)  domain.  In  such  cases,  the  optimum  can  be  reached  us- 
ing directional  filtering  as  a post-processing  operation  to  design 
a high-resolution  TFD  as  detailed  in  Section  4.2. 


a — 


/max  /o 
T 


3.  Performance  assessment  for  multi-component  instantaneous 
(38)  frequency  estimation 


The  performance  of  the  CKD  defined  above  is  evaluated  using  the  For  stationary  signals,  frequency  estimation  is  used  in  a number 

following  signals:  of  applications  including  music  and  speech  to  estimate  the  relative 
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Fig.  12.  The  TFDs  of  a multi-component  signal  composed  of  two  LFM  signals,  (a)  The  CKD  (c  = 1,  D — 0.05,  E — 0.12).  (b)  The  EMBD  (a  = 0.18,  ($  = 0.06).  (c)  The  CSK-TFD 
(c  = 0.6,  D = 0.05).  (d)  The  WVD.  (e)  The  MBD  (fi  = 0.16).  (f)  The  S-method  (Hamming  window  length  = 4 s). 


strengths  of  each  frequency  component  [39].  For  non-stationary 
signal  processing,  the  method  extends  to  IF  estimation  which  rep- 
resents the  variation  of  the  dominant  frequency  as  a function  of 
time  17].  This  concept  of  IF  is  intrinsically  linked  to  the  concept  of 
TFDs  as  the  IF  estimation  capability  of  TFDs  is  often  used  as  a cri- 
terion to  evaluate  their  performance,  including  robustness  against 
noise.  The  performance  of  the  previously  defined  TFDs  is  compar- 
atively assessed  below  for  both  simulated  and  real-world  signals 
in  terms  of  IF  performance  estimation.  This  requires  first  to  define 
clearly  what  is  being  estimated  and  the  criteria  for  comparison. 
Let  us  consider  first  the  simple  case  of  mono  component  signal  IF 
estimation. 

3.1  Meaning  and  estimation  of  the  instantaneous  frequency  for 
mono-component  signals 

In  TFD  based  IF  estimation  the  criteria  of  goodness  for  mono 
component  signals  are:  no  bias  and  minimum  variance.  These  IF 
estimates  can  be  defined  as  the  peaks  of  TFDs  or  as  their  first  mo- 
ments. One  aim  of  defining  high-resolution  TFDs  is  to  also  yield 
high  concentration  TFDs,  resulting  in  efficient  IF  estimates. 

As  the  concept  of  IF  was  introduced  to  represent  the  spectral 
variations  of  non-stationary  signals  [17],  a simple  example  is  the 
FM  signals,  for  which  the  IF  is  the  FM  law.  The  easiest  is  the  LFM, 


which  is  used  in  various  applications,  such  as  “vertical  seismic 
prospecting”  (VSP),  where  it  is  desired  to  study  parameters  such 
as  soil  absorption  and  dispersion  4].  For  such  signals,  the  WVD  is 
the  optimal  representation  in  terms  of  energy  concentration. 

Let  us  consider  a mono-component  analytic  signal  z(t ) modeled 
as  z(t ) = a(t)eJ<^^\  such  that  a(t)  is  a low-frequency  signal  whose 
spectrum  does  not  overlap  with  the  high-frequency  signal  as 
per  Eq.  (1)  and  <p(t)  is  defined  by  <p(t)  = arctan(Im(z(t))/Re(z(t)). 
The  IF  of  the  mono-component  signal  is  defined  as  the  time  deriva- 
tive of  its  phase: 


m = 


i dm 

2n  dt 


(40) 


A comparative  review  of  fundamental  algorithms  for  IF  estimation 
appeared  in  [40].  More  recent  advances  are  reviewed  next,  setting 
the  stage  for  the  more  difficult  case  in  next  section  that  is  used 
as  a test  for  comparing  the  new  TFDs  defined  in  the  previous  sec- 
tion. 


3.2.  IF  estimation  for  mono-component  signals 

The  main  (t,  f)  approach  to  IF  estimation  follows  the  simple 
principle  that  the  IF  of  a mono-component  signal  at  each  time  in- 
stant can  be  estimated  by  detecting  the  location  of  peaks  along  the 
frequency  axis  in  the  (t,  /)  plane. 


B.  Boas  hash  et  al.  / Digital  Signal  Processing  40(2015)  1 -30 


13 


2 4 G 8 ID  12  14  IG 

Frequency(Hz) 


(a) 


2 4 B 8 ID  12  14  IG 

Frequency(Hz) 


(c) 


Frequency(Hz) 

(e) 

Fig.  13.  TFDs  of  a seizure  signal,  (a)  The  CKD  (c  = 1,  D = 0.2,  E = 0.05).  (b)  The  EMBD 
(l 8 — 0.03).  (f)  The  S-method  (Hamming  window  length  = 4 s). 

Table  1 

Resolution  performance  measure  P of  selected  TFDs  for  a multi-component  signal 
composed  of  two  parallel  LFM  signals  (The  optimized  parameters  are  obtained  using 
an  exhaustive  search  of  the  parameter  that  maximizes  the  resolution  measure;  it  is 
not  “optimal”  in  the  classical  sense.  See  details  in  Appendix  A). 


Time  frequency  distribution 

Optimized  parameters 

P 

Compact  Kernel  Distribution 

C = 1,  D — 0.05,  E = 0.125 

0.8919 

Extended  Modified  B distribution 

cy  = 0.18,  £ = 0.06 

0.8376 

Compact  Support  Kernel 

C = 0.6,  D = 0.05 

0.7854 

Wigner-Ville  Distribution 

Not  available 

0.7063 

Modified  B-distribution 

P = 0.16 

0.7628 

S-Method 

Hamming  window  length  = 4 s 

0.8778 

Several  algorithms  are  based  on  the  above  principle.  For  ex- 
ample, the  WVD  gives  an  ideal  estimate  for  mono-component  LFM 
signals  so  it  can  be  used  as  an  IF  estimator  in  such  cases.  However, 
its  estimate  is  biased  for  non-linearly  FM  signals  [41],  and  so  other 
methods  are  needed.  In  the  general  case,  the  bias  can  be  overcome 
by  estimating  the  IF  from  a windowed  WVD,  but  windowing  in- 
creases the  variance  of  the  IF  estimate.  An  optimal  window  length 
for  an  optimum  bias-variance  tradeoff  at  each  time  instant  can  be 
estimated  adaptively  [41  ].  Following  this  approach,  multiple  WVDs 
can  be  used  to  obtain  multiple  estimates  of  the  IF  as  well  as  the 
confidence  interval  for  each  estimate.  Starting  from  the  smallest 
window,  the  confidence  intervals  of  the  IF  estimates  of  successive 
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(i a = 0.02,  ft  = 0.75).  (c)  The  CSK-TFD  (c  = 0.1,  D = 0.06).  (d)  The  WVD.  (e)  The  MBD 

windows  are  tested  for  overlap  till  the  intersection  of  the  confi- 
dence intervals  of  two  successive  windows  results  in  a null  set. 
The  IF  estimate  of  the  longest  window  fulfilling  the  criterion  of 
overlapping  of  confidence  intervals  is  then  selected  as  an  optimal 
estimate.  This  method  is  known  as  the  ICI  rule  where  ICI  means  in- 
tersection of  confidence  intervals  [41  . A modified  approach  further 
improves  the  accuracy  of  the  ICI  algorithm  by  taking  into  account 
the  amount  of  overlap  between  two  successive  confidence  inter- 
vals. Other  iterative  IF  estimation  algorithms  can  obtain  the  initial 
estimate  of  the  IF  from  the  Spectrogram  [42].  The  IP,  defined  as 
the  integration  of  the  IF,  is  then  used  to  demodulate  the  original 
signal.  The  demodulation  shifts  the  spectrum  of  the  original  sig- 
nal around  the  zero  frequency.  The  IF  of  the  demodulated  signal 
is  then  estimated  again  and  this  process  is  iterated  till  there  is  no 
further  change  in  two  consecutive  estimates  of  the  IF. 

Most  QTFDs  fail  to  accurately  estimate  the  IF  at  very  low  SNR. 
In  such  scenarios  improvements  can  be  obtained  by  taking  into  ac- 
count the  main  direction  in  the  (t,  /)  plane  where  there  is  most 
energy  concentration.  For  example,  the  adaptive  combination  of 
the  directionally  smoothed  WVDs  was  shown  to  obtain  a (t,  /) 
representation  that  has  high  concentration  of  signal  energy  along 
the  IF  of  the  signal  even  at  a very  low  SNR  [43].  Such  methods 
need  to  be  further  extended  to  deal  with  the  more  general  multi- 
component  case  as  addressed  in  the  next  section. 
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Fig.  14.  TFDs  of  a fetal  movement  signal,  (a)  The  CKD  (c  = 1,  D = 0.075,  E = 0.075).  (b)  The  EMBD  (a  = 0.125,  ft  = 0.075).  (c)  The  CSK-TFD  (c  = 1,  D = 0.075).  (d)  The  WVD. 
(e)  The  MBD  (ft  = 0.05).  (f)  The  S-method  (Hamming  window  length  = 4 s). 


3.3.  IF  estimation  for  multi-component  signals 


For  a multi-component  signal,  the  (t,  /)  approach  follows  the 
principle  that  a multi-component  analytic  signal  may  be  expressed 
as  the  sum  of  Nc  mono-component  signals,  i.e., 


Nc  Nc 

Z(t)  = X>(t)  = X>(t)e#i(t)  (41) 

i=l  i=l 

The  IF  of  the  ith  signal  component  is  defined  by  the  time  deriva- 
tive of  its  phase  as 


fi(t)  = 


i mt )_ 

2n  dt 


(42) 


If  the  individual  components  of  the  signal  are  FM  signals,  then  the 
multi-component  signal  appears  as  multiple  ridges  in  the  (t,  /) 
plane.  Multi-component  IF  estimation  algorithms  need  then  to  in- 
volve both  detection  and  linking  of  related  peaks  to  estimate  the 
IF  of  each  signal  component.  The  linkage  stage  is  needed  so  that 
the  IF  of  each  component  can  be  tracked  accurately  and  without 
mix-up  with  the  IFs  of  other  components. 


3.3.1.  IF  linking  algorithms 

Image  processing  techniques  can  be  used  to  link  all  the  points 
of  each  IF  in  the  (t,  /)  domain  [44].  The  key  stages  involve: 


1.  Detecting  peaks  in  a TFD  by  using  first  and  second  order  par- 
tial derivatives  along  the  frequency  axis. 

2.  Linking  the  detected  peaks  using  some  neighborhood-con- 
nectivity criteria  [44].  For  example,  using  a 10-neighborhood 
connectivity  criterion,  a peak  belonging  to  the  IF  of  a signal 
component  must  have  at  least  one  other  peak  of  the  IF  of  the 
same  signal  component  in  its  10-neighborhood  and  it  should 
not  have  any  peak  of  the  IF  of  any  other  signal  component  in 
this  selected  neighborhood.  The  10-neighborhood  of  a peak  at 
the  location  (x,  y)  is  defined  as  a set  of  all  pixels  at  the  fol- 
lowing locations  [44]. 

f (x+  1,  y)  (x+l,y  + l)  (x  + 1 , y + 2)  (x+l,y-l)  (x+l,y-2)l 

l(x-l,y)  (x  — 1,  y + 1)  (x  — 1 , y — 2)  (x-l,y-l)  (x-l,y-2)J 

(43) 

This  image  processing  linking  algorithm  is  fully  automatic  and  does 
not  require  prior  information  like  the  number  of  components,  their 
IF  laws  and  SNR. 

3.3.2.  Multi-component  IF  estimation  based  on  component  extraction 
and  ICI  algorithm 

Let  us  consider  for  illustration  another  method,  the  ICI  based 
IF  estimation  algorithm  which  was  extended  to  multi-component 
signals  by  first  extracting  signal  components  using  a blind  source 
separation  algorithm  45].  This  algorithm  iteratively  extracts  signal 
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Fig.  15.  (a)  The  CKD  (c  = 1,  D = 0.2,  E — 0.05)  of  an  EEG  signal,  (b)  The  IFs  of  signal  components  estimated  using  the  method  described  in  [44]. 


(b) 


Fig.  16.  (a)  The  MSE  of  the  IF  estimates  of  si(t)  obtained  using  high-resolution  TFDs  vs.  SNR.  (b)  The  MSE  of  the  IF  estimates  of  S2(t)  obtained  using  high-resolution  TFDs  vs. 
SNR.  In  all  these  cases  the  IFs  are  estimated  using  the  component  extraction  procedure  [47]. 


components  from  the  (t,  /)  plane  till  the  total  energy  of  the  signal 
falls  below  a certain  threshold.  The  disadvantage  of  this  algorithm 
is  its  dependence  on  prior  information  like  total  number  of  signal 
components  or  SNR  of  a signal. 

A comparative  study  has  shown  that  the  above  image  pro- 
cessing algorithm  based  on  connected  component  linking  is  more 
suitable  for  estimating  the  IF  of  the  physiological  signals  consid- 
ered as  it  is  computationally  efficient  as  compared  to  the  com- 
ponent extraction  method  [46].  Fig.  15  illustrates  an  example  of 
IF  estimation  of  a multi-component  EEG  signal  using  the  im- 
age processing  technique  described  in  [44].  Both  ICI  with  blind 
source  separation  and  connected  component  linking  algorithms 
can  only  be  applied  to  (t,  /)  separable  signals,  i.e.,  signals  whose 
components  do  not  overlap  with  each  other  in  the  (t,  /)  do- 
main. Many  of  the  real-world  signals  do  not  fulfill  this  property 
so  these  methods  are  not  general  and  should  be  applied  only 
to  relevant  classes  of  signals,  therefore  requiring  a-priori  knowl- 
edge. 

In  order  to  design  high  performance  multi-component  IF  es- 
timation algorithms  for  the  general  case,  it  is  important  to  re- 
member that  the  performance  of  multi-component  IF  estimation 
algorithms  depend  on  the  resolution  and  cross-term  suppression 
capabilities  of  the  TFD  employed.  Recent  studies  have  indicated 
that  data-dependent  high-resolution  TFDs  like  the  Adaptive  Frac- 
tional Spectrogram  (AFS)  [47]  and  variable-bandwidth  filter  based 
heterogeneous  TFD  [9]  can  outperform  other  TFDs  in  terms  of  their 
ability  to  accurately  estimate  the  IFs  of  closely  spaced  signal  com- 
ponents at  low  SNR.  The  improvement  is  naturally  obtained  using 
an  adaptive  procedure  so  as  to  get  a “perfect  match”  with  the  data. 
The  previous  TFDs  are  therefore  compared  below  in  terms  of  their 
IF  estimation  performance.  The  AFS  is  included  in  the  comparison 
as  an  example  of  data  dependent  TFD. 


3.3.3.  Comparison  of  TFDs  in  terms  of  their  multi-component  IF 
estimation  performance 

In  order  to  evaluate  the  IF  estimation  capability  of  the  TFDs 
designed  above,  let  us  consider  a two-component  real  signal  ex- 
pressed as: 


S(t)  = Si(t)+S2(t), 

(44) 

where 

si  (t)  = rect 

[t-r,28]coS(2„(o,r+°;5t>)). 

(45) 

s2  (t)  = rect 

[%,28]c„s(2*(o,5t+°;V)), 

(46) 

with  time  duration  T = 256  s and  sampling  frequency  fs  = 1 Hz. 
The  corresponding  IFs  are 


fi(t)  = 0.3y  +0.1,  (47) 

f2(t)  = 0.3^ +0.15.  (48) 

The  TFDs  considered  are  the  EMBD,  Spectrogram,  CKD  and  AFS; 
for  the  latter  the  IF  is  estimated  using  the  component  extraction 
procedure  as  discussed  in  [47].  The  mean  square  error  (MSE)  is 
estimated  by  performing  200  Monte  Carlo  simulations  [48].  Fig.  16 
displays  the  results  of  the  IF  estimates.  The  results  indicate  that 
the  CKD  gives  more  accurate  estimates  of  the  IFs  as  compared  to 
other  non-adaptive  TFDs,  and  naturally,  the  AFS  does  even  better 
due  to  the  adaptation. 

The  above  findings  indicate  that  an  accurate  estimation  of  the 
IF  of  the  signal  components  is  directly  related  to  the  cross-term 
suppression  and  auto-term  resolution  properties  of  (t,  f)  methods 
employed.  Therefore,  there  is  a need  to  develop  better  methods 
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for  improving  the  resolution  of  TFDs.  The  following  section  will 
discuss  image  processing  methods  for  improving  the  resolution  of 
TFDs. 

4.  Time-frequency  image  processing  techniques  for  TFD 
enhancement 

Clarity  of  representation  is  essential  for  TFDs,  so  that  their  rele- 
vant features  can  be  read,  selected  and  extracted.  As  all  QTFDs  suf- 
fer from  limitations  such  as  inherent  compromise  between  cross- 
term suppression  and  auto-term  resolution,  it  may  adversely  affect 
the  extraction  of  some  relevant  features  from  ( t , /)  images  as  well 
as  components  separation.  To  improve  the  readability,  let  us  con- 
sider the  TFD  as  a (t,  f)  image.  Then,  image  processing  techniques 
can  be  used  as  a processing  stage  to  either  improve  the  resolu- 
tion of  cross-term-free  low-resolution  TFDs  (e.g.,  the  Spectrogram) 
using  de-blurring  methods  or  further  suppress  cross-terms  in  high- 
resolution  TFDs  (e.g.,  the  WVD).  Section  4.1  reviews  existing  image 
processing  techniques  for  TFD  image  enhancement,  and  Section  4.2 
presents  an  improved  novel  image  processing  technique  for  im- 
proving the  resolution  of  the  class  of  QTFDs  considered  and  there- 
fore allowing  improved  multi-component  IF  estimation. 

4.1.  TFD  enhancement  using  image  processing  techniques 

Given  the  number  of  different  ways  to  generate  a high- 
resolution  TFD  and  the  imperfections  associated  with  each  one  of 
them,  a natural  approach  is  to  process  the  TFD  image  using  image 
processing  techniques  to  improve  the  clarity  of  the  image.  Three 
types  of  processing  are  reviewed  in  the  following  sections:  (t,  /) 
image  de-blurring,  ( t , f)  de-noising  and  (t,  f ) image  artifact  sup- 
pression. 

4.1.1.  Time-frequency  image  de-blurring 

Image  de-convolution  techniques  are  used  for  the  estimation  of 
high  resolution  images  from  blurred  ones.  For  example,  these  tech- 
niques have  been  applied  to  estimate  an  original  high-resolution 
TFD  from  the  Spectrogram,  which  can  be  interpreted  as  a blurred 
version  of  the  true  (t,  /)  image  [34].  Another  method  reduces 
the  blurring  in  the  Spectrogram  by  using  artificial  neural  net- 
works [49].  This  technique  estimates  the  inverse  of  the  blurring 
function  using  neural  networks,  which  are  trained  using  a pre- 
computed Spectrogram  and  the  WVD.  The  trained  neural  networks 
are  then  applied  to  the  Spectrogram  to  estimate  the  true  high- 
resolution  TFD.  The  disadvantage  of  this  technique  is  that  it  results 
in  a discontinuous  TFD  [49]. 

4.1.2.  Time-frequency  image  de-noising 

One  way  to  enhance  (t,  /)  images  is  to  filter  out  noise  to  im- 
prove their  SNR,  using  image  de-noising  techniques.  Such  tech- 
niques often  assume  an  additive  white  noise  model  to  account  for 
the  distribution  of  noise  in  images.  Unfortunately,  this  model  is 
not  valid  in  general  for  TFDs  as  earlier  studies  have  shown,  that 
apart  from  the  WVD,  noise  is  not  uniformly  distributed  in  (t,  f) 
images  [50].  The  influence  of  noise  is  stronger  in  the  region  of  sup- 
port of  auto-terms  as  noise  appearing  outside  the  region  of  support 
of  the  auto-terms  is  reduced  due  to  the  low-pass  filtering  applied 
in  the  ambiguity  domain.  So,  (t,  /)  image  de-noising  algorithms 
should  be  designed  by  taking  into  account  the  characteristics  of 
noise  distribution  in  QTFDs. 

Some  of  the  image  de-noising  algorithms  that  have  been  ap- 
plied to  (t,  f)  images  are  discussed  below. 

a)  Singular  Value  Decomposition  (SVD)  based  methods:  As  sig- 
nals represented  in  the  (t,  f)  domain  are  often  characterized 
by  just  a few  components,  then,  the  resulting  TFD  p(n,k)  can 


be  considered  as  a sparse  matrix.  Such  TFD  is  in  general  not 
full  rank  and  its  SVD  can  be  expressed  as  [51]: 

p(n,k)  =UDVh,  (49) 

where  U is  an  orthogonal  matrix,  VH  is  the  conjugate  trans- 
pose  of  the  orthogonal  matrix  V,  and  D = diag(ai,«2,  • • • , 
ofjv-r+i,  • • • , aw)  is  a diagonal  matrix  in  which  the  last  r sin- 
gular values  are  equal  to  0 ([aqv-r+i,  ctN-r,  • , (Xn]  = 0j).  In 
the  presence  of  white  noise,  the  TFD  matrix  becomes: 

pin,  k)  = UDVh  + N = [UiDiVi]h.  (50) 

The  TFD  matrix  is  now  full  rank  with  its  last  r singular  values 
being  non-zero. 

Based  on  this  property,  a simple  and  fast  de-noising  algorithm 
can  be  designed  as  follows  51]: 

- Divide  the  TFD  matrix  into  sub-blocks  Bs. 

- Perform  SVD  on  each  sub-block  Bs  = UsDsVf  and  set  to  zero 
the  last  singular  values  that  are  smaller  than  a threshold  e. 

- Replace  the  submatrix  Bs  by  Bs  = UsDsVf  (Ds  is  the  modi- 
fied diagonal  matrix  D). 

- Reconstruct  the  denoised  matrix  p(n,k). 

Fig.  17  illustrates  the  enhancement  obtained  using  the  afore- 
mentioned (t,  /)  image  de-noising  algorithm. 

Another  SVD  based  de-noising  method  reduces  noise  by  apply- 
ing low-pass  filtering  to  singular  vectors  [52].  The  smoothed 
singular  vectors  are  then  used  to  synthesize  a de-noised 
TFD.  Experimental  results  demonstrate  that  this  method  can 
suppress  noise  without  deteriorating  the  basic  structure  of 
TFDs  [52]. 

b)  Wavelet  based  methods:  These  image  de-noising  techniques 
for  improving  the  SNR  of  (t,  f)  images  are  based  on  the  as- 
sumption that  most  of  the  (t,  f)  image  energy  is  concentrated 
in  only  few  coefficients  of  the  Wavelet  Transform  (WT).  The 
WT  is  obtained  by  decomposing  a TFD  into  several  bands  us- 
ing a low-pass  filter  and  a high-pass  filter  [53].  So,  a (t,  /) 
image  is  de-noised  by  assigning  zero  value  to  coefficients  be- 
low a certain  threshold  and  then  inverting  the  WT.  Most  of  the 
wavelet  de-noising  techniques  use  additive  white  noise  model 
for  noise.  This  model  may  be  considered  as  a rough  approxi- 
mation of  the  noise  in  QTFDs. 

c)  Morphological  processing:  These  techniques  exploit  the  fact 
that  signal  components  appear  as  ridges/curves  occupying 
large  connected  regions  in  a (t,  /)  plane.  On  the  other  hand, 
noise  is  randomly  distributed  in  a (t,  f)  plane  and  appears  as 
small  objects  or  disconnected  regions  [53].  These  small  objects 
can  be  eliminated  from  (t,  /)  images  by  using  the  image  open- 
ing operation  [54].  This  operation  compares  a predetermined 
template  (often  called  the  structuring  element)  with  objects  in 
an  image;  if  the  size  of  an  object  is  smaller  than  that  of  the 
template,  it  is  deleted  from  the  image;  otherwise  it  is  retained. 

d)  Anisotropic  diffusion  [55]:  Traditional  2D  low-pass  filtering 
schemes  result  in  blurring  in  all  directions  regardless  of  object 
boundaries.  The  anisotropic  diffusion  overcomes  this  limita- 
tion by  performing  edge-preserving  image  enhancement  such 
that  the  smoothing  is  performed  on  all  the  points  in  the  (t,  /) 
plane  except  the  points  lying  near  the  objects  boundaries  (i.e., 
near  the  boundaries  of  support  regions  of  the  signal  compo- 
nents). 

4. 1.3.  Cross- term  suppression 

Three  separate  techniques  are  reviewed  below  that  use  image 
processing  techniques. 

a)  Some  techniques  use  morphological  image  reconstruction  to 
suppress  interference  terms  in  the  WVD  11].  Such  techniques 
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Fig.  17.  Illustration  of  the  (t,  /)  image  denoising  algorithm,  (a)  The  MBD  of  a two-component  quadratic  FM  signal,  (b)  The  MBD  of  the  same  signal  corrupted  by  additive 
white  Gaussian  noise  (SNR  = 0 dB).  (c)  The  denoised  MBD  obtained  using  the  algorithm  described  in  [51]. 


usually  employ  two  images;  one  is  a marker  that  contains  the 
starting  points  of  the  transformation,  while  the  other  one  is 
a mask  that  constrains  the  transformation.  For  example,  the 
technique  described  in  [11]  processes  the  WVD,  a marker  in 
this  case,  using  the  characteristics  of  the  Spectrogram  to  form 
a mask.  The  processing  of  the  WVD  starts  at  peaks.  These 
peaks  are  then  dilated  while  the  thinned  Spectrogram  con- 
strains the  spreading  of  the  peaks.  This  process  continues  until 
the  marker  stops  changing.  This  technique  combines  the  ad- 
vantages of  the  WVD  and  Spectrogram  so  that  the  resulting 
representation  has  high  energy  concentration  and  is  also  cross- 
terms free. 

b)  Another  approach  uses  a non-linear  center  affine  filter  to  sup- 
press the  cross-terms  in  the  WVD  56];  such  a filter  is  de- 
signed to  exploit  the  local  statistics  of  a (t,  /)  image  to  adapt 
its  shape.  In  the  regions  of  high  variance  it  transforms  itself 
into  a low-pass  filter  while  in  the  regions  of  low  variance  it 
adapts  itself  into  an  identity  operator.  Thus  it  aims  to  reduce 
cross-terms  while  retaining  the  shape  of  auto-terms  [56]. 

The  limitation  of  the  above  cross-term  suppression  techniques 
[11,56]  is  that  they  fail  to  give  optimum  results  when  cross- 
terms overlap  with  auto-terms. 

c)  A hybrid  image  and  signal  processing  technique  can  be  used 
to  suppress  cross-terms  in  the  WVD  even  in  scenarios  when 
cross-terms  overlap  with  auto-terms  [12].  The  key  stages  of 
this  technique  are  shown  in  Fig.  18.  The  technique  uses  a 
blurred  cross-term-free  reduced-interference  TFD  to  locate  sig- 
nal component  in  a ( t , f ) plane.  The  reduced-interference  TFD 


is  then  segmented  using  the  connectivity  criterion  discussed  in 
Section  3.3;  each  segment  corresponds  to  a signal  component 
in  the  time  domain.  Fractional  Fourier  filtering  is  then  applied 
to  perform  time-varying  filtering  so  as  to  extract  signal  com- 
ponents from  the  original  signal.  The  extracted  components 
are  analyzed  separately  using  the  masked  WVDs  to  suppress 
both  inner  and  outer  interference  terms.  The  masked  WVDs 
of  the  separated  signal  components  are  added  up  to  obtain  a 
cross-term  free  high-resolution  TFD.  This  technique  can  also 
be  used  for  estimating  the  IFs  of  multi-component  signals  as 
the  extraction  of  signal  components  reduces  the  problem  of  IF 
estimation  of  multi-component  signals  to  the  basic  IF  estima- 
tion of  mono-component  signals.  This  technique  can  be  more 
broadly  applied  than  the  first  two  methods,  but  it  is  still  lim- 
ited to  signals  with  non-overlapping  (t,  /)  signal  components. 
The  step  by  step  illustration  of  this  third  approach  for  a two- 
component  signal  is  shown  in  Fig.  19. 

All  the  above  mentioned  (t,  /)  image  processing  methods  are 
useful  tools  for  the  improvement  of  TFDs  readability  and  energy 
concentration.  However,  these  methods  cannot  overcome  the  fun- 
damental resolution  limitation  of  QTFDs  in  dealing  with  closely 
placed  signal  components.  The  following  subsection  presents  a 
different  approach  combining  directional  filtering  and  image  pro- 
cessing techniques  for  defining  a high-resolution  TFD  that  can 
better  deal  with  such  a situation  and  resolve  closely  placed  sig- 
nal components. 
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Fig.  18.  Flow  chart  of  the  TFD  resolution  enhancement  technique  [30]. 


(d)  (e)  (f) 

Fig.  19.  Step  by  step  illustration  of  a hybrid  approach  combining  image  and  signal  processing  techniques  to  reduce  cross-terms  [12].  (a)  The  WVD.  (b)  The  reduced  interference 
QTFD.  (c)  (t,  f ) image  segmented  such  that  each  segment  corresponds  to  a signal  component,  (d)  The  WVD  of  the  first  signal  component  (extracted  using  fractional  filtering), 
(e)  The  WVD  of  the  second  component  (extracted  using  fractional  filtering),  (f)  The  desired  (t,  /)  representation  obtained  by  first  masking  the  WVDs  of  the  extracted  signal 
components  with  Spectrogram  and  then  adding  them,  thus  illustrating  the  image  processing  approach  to  improve  TFDs’  resolution. 


4.2.  Time-frequency  image  enhancement  using  adaptive  directional 
filtering 

The  performance  that  can  be  achieved  by  combining  a ( t , f) 
approach  with  an  image  processing  approach  can  be  further  en- 
hanced by  using  an  adaptive  directional  filtering  based  (t,  f ) image 
enhancement  technique.  This  refinement  includes  adapting  the  di- 
rection of  a (t,  /)  kernel  at  each  ( t , /)  point  on  the  basis  of  the 


direction  of  energy  concentration  resulting,  in  essence,  in  the  defi- 
nition of  another  high-resolution  TFD. 

4.2.1.  Review  of  signal  dependent  TFDs  with  automatic  parameter 
estimation 

Most  TFDs  require  optimization  of  their  parameters  to  obtain  a 
clear  and  accurate  (t,  /)  representation  with  the  best  possible  res- 
olution; e.g.  incorrect  selection  of  window  length  for  the  Spectro- 
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gram  can  result  in  a significantly  different  visual  representation  to 
the  correct  representation  (see  [33],  p.  39).  So,  there  is  a need  for 
some  kind  of  automatic  procedure  for  the  selection  of  parameters 
in  applications  requiring  real-time  computations.  Most  of  the  exist- 
ing adaptive  TFDs  optimize  the  parameters  of  their  kernels,  either 
globally  or  locally,  to  maximize  a certain  measure  representing  the 
quality  of  a TFD  [57-60].  Some  of  the  commonly  used  quantitative 
measures  for  assessing  TFD  quality  include  the  Kurtosis  (Ratio  of 
norms)  [61],  the  Renyi  Entropy  [60],  and  the  energy  concentration 
measure  [57].  These  and  other  measures  are  described  in  detail  in 
Section  5.3.  For  any  given  quality  measure,  the  parameters  of  TFDs 
can  be  optimized  by  either  using  an  exhaustive  search  technique 
or  employing  more  computationally  efficient  methods  such  as  the 
steepest  decent  recursive  algorithm  [57]. 

4.2.2.  Design  of  the  adaptive  TFD 

Previous  studies  have  reported  that  for  underspread  signals, 
directional  filtering  along  the  major  axis  of  the  auto-terms  can 
significantly  reduce  the  cross-terms  without  affecting  the  resolu- 
tion of  the  auto-terms  [32,62].  These  schemes  naturally  require 
prior  information  about  the  angle  formed  by  the  auto-terms  and 
the  time  axis.  The  angle  formed  by  the  auto-terms  and  the  time 
axis  can  be  estimated  from  the  moments  of  the  fractional  Fourier 
transform  [63  for  signals  whose  auto-terms  have  only  one  direc- 
tion of  energy  concentration  in  a ( t , f ) plane.  In  general,  there  are 
several  possible  directions  of  energy  concentration  in  the  (t,  /)  do- 
main and  this  would  require  separate  filters  for  all  these  directions 
at  a local  level.  Another  solution  is  to  design  a post-processing 
technique  to  further  improve  high-resolution  TFDs  by  adapting  the 
direction  of  a smoothing  kernel  locally  for  each  point  in  the  (t,  f) 
plane.  Such  adaptive  kernel  TFD  can  be  expressed  as  an  extension 
of  Eq.  (6),  i.e.: 

Padapt(f>  /)  — Y0(t,f)(t’  /)(**)  /)>  (51) 

where  pa dapt(G  /)  is  the  adaptive  directional  TFD  (ADTFD), 
Yo(t,f)(t,  /)  is  an  adaptive  kernel  whose  direction  is  adapted  at 
each  point  in  the  (t,  /)  plane,  and  p(t,  f ) is  the  QTFD  used  to 
start  the  process.  A directional  smoothing  kernel  should  have  the 
following  properties: 

• The  kernel  should  have  the  maximum  response  when  aligned 
with  ridges.  This  implies  that  the  kernel  should  have  low-pass 
characteristics  along  its  major  axis. 

• The  kernel  output  should  be  zero  for  non-ridge  points  as  oth- 
erwise spectral  leakages  would  be  observed  at  (t,  f ) points 
where  no  signal  is  present. 


0.3  0.4  0.5  O.G  0.7 

Frequency  (Hz) 


Fig.  20.  Illustration  of  the  direction  of  the  auto-terms  as  well  as  the  direction  of  the 
oscillation  of  cross-terms  in  the  WVD  of  a signal  composed  of  two  LFM  components. 


1.  Cross-terms  are  oscillatory  in  the  direction  of  their  major  axis, 
but  auto-terms  are  not,  as  illustrated  in  Fig.  20. 

2.  Filtering  along  the  major  axis  of  the  auto-terms  does  not  blur 
a TFD. 

3.  Both  auto-terms  and  cross-terms  appear  as  ridges  in  a QTFD. 


The  implication  is  that  smoothing  (low-pass  filtering)  along  the 
major  axis  of  cross-terms  will  suppress  these  cross-terms  as  they 
are  oscillatory  along  their  major  axis  (and  they  are  localized  away 
from  the  origin  in  the  ambiguity  domain)  while  smoothing  along 
the  major  axis  of  the  auto-terms  will  not  affect  their  resolu- 
tion [67].  The  direction  of  the  smoothing  filter  for  each  (t,  /)  point 
is  adapted  to  the  direction  of  the  major  axis  of  ridges  of  a magni- 
tude squared  QTFD.  The  reason  is  that  cross-terms  and  auto-terms 
of  a QTFD  appear  as  ridges  when  the  squared  magnitudes  are 
used  because  the  squared  magnitudes  of  cross-terms  have  non- 
oscillatory  characteristics  in  the  (t,  /)  domain  due  to  the  squaring 
operation.  The  ridge  direction  (i.e.,  the  direction  of  maximum  en- 
ergy of  a TFD)  is  estimated  by  maximizing  the  correlation  of  the 
DGF  with  the  magnitude  squared  QTFD;  expressed  as: 


%,/)  = arg  max 
o 


J J \p(t  -t'*  f - f')\2Ye(tj)(t\ 

R R 


(53) 


In  the  above  equation,  p(t,  f ) is  squared  to  remove  the  oscilla- 
tion in  the  cross-terms  as  discussed  above;  on  the  other  hand, 
Yo(t,f)(t>  f)  is  not  squared  as,  otherwise,  it  would  change  the 
shape  of  the  DGF  filter.  Eq.  (53)  can  be  numerically  solved  by  per- 
forming the  following  steps: 


The  second-order  partial  derivative  of  a Directional  Gaussian  Fil- 
ter (DGF)  is  selected  as  a directional  kernel  as  it  fulfills  the  above 
properties.  This  filter  has  a maximum  response  when  parallel  to 
ridges  because  of  low-pass  characteristics  along  its  major  axis  and 
its  response  reduces  to  zero  as  the  filter  is  orthogonal  to  ridges  be- 
cause of  differentiation  along  its  minor  axis  [64-66].  It  is  defined 
as: 

d2[e-((ate)2+(bfe)2)] 

/)  = —2 , 

oJq 

te=tzos(0)  + /sin(0),  fe  = /cos(0)  - tsin(0),  (52) 

where  6 is  the  rotation  angle  with  respect  to  the  time-axis,  a and 
b control  the  spread  of  the  DGF  along  the  time  and  frequency  axis 
respectively.  The  design  criterion  to  choose  the  direction  of  the 
adaptive  kernel  for  each  point  in  the  (t,  /)  plane  is  based  on  the 
following  observations  [67]: 


• Convolve  a magnitude  squared  TFD  with  I<  directional  filters 
such  that  the  angle  of  each  filter  is  given  by: 

0k  = 2ff,  k = h--,K.  (54) 

i\ 

• For  each  ( t , /)  point,  choose  the  angle  that  maximizes  the 
magnitude  of  a directionally  smoothed  TFD. 

Otj  = aTgmax\ydk(t,  f)  **  \p(t,  f)\2\2.  (55) 

K k (t,f) 

The  block  diagram  for  the  implementation  of  the  proposed  ADTFD 
is  shown  in  Fig.  21. 

4.2.3.  Performance  evaluation 

In  order  to  evaluate  the  performance  of  the  ADTFD,  let  us  con- 
sider three  multi-component  signals  with  the  characteristics  that 
their  TFD  auto-terms  are  parallel  neither  to  the  time  axis  nor  to 
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Fig.  21.  Implementation  of  the  adaptive  kernel  TFD.  (Note  that  Ok  = 2 nk/K,  where  k = 1,2,...  K.) 


Table  2 

Resolution  performance  of  selected  TFDs  for  a multi-component  signal  composed  of 
two  parallel  LFM  signals. 


TFD 

Optimal  kernel  parameters 

P 

CKD 

c = 1,  E — 0.12,  D — 0.05 

0.8919 

Adaptive  TFD 

a — 2,  b — 20 

0.9648 

S-Method 

Hamming  window  length  = 4 s 

0.8778 

Extended  Modified  B distribution 

a = 0.18,  £ = 0.06 

0.8376 

Modified  B-distribution 

£ = 0.16 

0.7628 

the  frequency  axis.  Most  separable  kernel  TFDs  fail  to  achieve  a 
high  (t,  /)  resolution  for  these  signals  because  of  the  absence  of 
the  (t,  /)  energy  concentration  direction  parameters  in  their  de- 
sign. Such  parameters  relate  to  the  presence  and  localization  of 
the  auto-terms. 

a)  Two  parallel  LFM  signals:  The  multi-component  signal  com- 
posed of  two  parallel  LFM,  described  in  Section  2.3.4,  is  used, 
for  consistency  and  continuity,  to  evaluate  the  performance 
of  the  DGF  adaptive  (t,  /)  image  enhancement  technique,  as 
compared  with  the  CKD  and  S-method,  as  they  are  found  in 
this  study  to  be  the  best  performing  QTFDs  in  terms  of  res- 
olution and  cross-term  suppression  capabilities.  Fig.  22  illus- 
trates the  results.  The  performance  of  the  proposed  TFD  is 
evaluated  using  the  Boashash-Sucic’s  criterion  [38],  with  re- 
sulting P values  recorded  in  Table  2.  The  proposed  adaptive 
TFD  outperforms  the  CKD  by  7%  in  terms  of  its  ability  to  retain 
energy  concentration  of  auto-terms  while  suppressing  cross- 
terms. 

b)  Multi-component  signal  composed  of  signal  components  of 
varying  amplitudes  and  frequency  rates:  let  us  consider  a 
multi-component  signal  composed  of  two  LFMs,  one  Gaussian 
atom  and  one  tone,  expressed  as: 

_ | si  (0  + s2(0  + 53(f)  + S4(t),  0 <t<T,  . . 

^ — [ 0,  otherwise,  ^ 

where  the  signal  components  have  the  characteristics  shown 
below. 


Signals 

si(t) 

S2  (t) 

Formula 

cos(2jt  (0.0006t2  +0.16t))  cos(2jt  (0.0006t2  + 0.10) 

Signals 

S3  (0 

s 4(0 

Formula 

0.75cos(0.087Tt) 

2e-0.0025(t— 175)2  sin(0.257Tt) 

The  IFs  of  the  signal  components  are, 

IF 

hit) 

hit)  hit)  hit) 

Formula 

0.0012t  + 0.16 

0.0012t  + 0.1  0.04  0.125 

The  sampling  frequency  is 
T = 255  s for  the  signal  1 

1 Hz  and  the  signal  duration  is 
components  Si(t),S2(t)  and  S3(t). 

The  corresponding  effective  time  duration  Te  and  effective 
bandwidth  Be  of  S4(t)  are  given  by  Te  = 20^/n /2  s and 
Be  = 0.05/V27 r Hz  (see  [68]  for  these  definitions).  The  to- 
tal signal  has  more  than  one  direction  of  energy  concentration 
and  the  distribution  of  the  signal  energy  along  each  direction 
is  not  uniform.  Given  such  complexity,  this  signal  is  difficult  to 
analyze  using  existing  fixed-kernel  TFDs  because  these  meth- 
ods do  not  adapt  the  kernel  locally  at  each  (t,  /)  point  and  any 
particular  choice  would  be  adapted  only  to  one  component, 
but  would  blur  the  others.  The  performance  of  the  DGF  (t,  /) 
image  enhancement  technique  is  compared  with  the  CKD  and 
another  adaptive  TFD  defined  earlier  [58,59]  with  results  as 
shown  in  Fig.  23.  The  ADTFD  gives  high  energy  concentration 
for  all  signal  components,  whereas  the  CKD  and  the  adaptive 
TFD  [59]  fail  to  concentrate  energy  for  the  weak  signal  compo- 
nent i.e.,  the  tone  as  observed  in  the  left-hand  side  of  Fig.  23. 
c)  Real-life  EEG  signal:  The  previous  experiment  is  repeated  us- 
ing a real-life  EEG  signal  described  in  Section  2.3.4.  Fig.  24 
shows  that  the  ADTFD  gives  high  energy  concentration  for  the 
low-amplitude  signal  components  appearing  e.g.,  above  5 Hz, 
whereas  other  TFDs  fail  to  concentrate  energy  for  the  weak 
signal  components. 

The  advantage  of  QTFDs  such  as  the  CKD  is  that  they  are  useful  for 
most  non-stationary  signals  with  different  performance  depending 
on  the  type  of  signal  analyzed.  In  scenarios  when  such  methods 
fail  to  give  result  with  enough  resolution,  then  image  processing 
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(a)  (b) 


Frequency  (Hz) 

(c) 


Fig.  22.  ( t , f ) representations  of  two  parallel  LFM  signals,  (a)  The  CKD  (c  = 1,  E = 0.12,  D = 0.05).  (b)  The  adaptive  (t,  f ) image  enhancement  by  DGF  (a  — 2,b  — 20)  and 
(c)  the  S-Method  (Hamming  window  length  = 4 s). 


Fig.  23.  The  (t,  f)  representations  of  a multi-component  signal  composed  of  signal  components  of  varying  amplitudes  and  chirp  rates,  (a)  The  CKD  (C  = 0.1,  E = 0.065, 
D = 0.065).  (b)  The  Adaptive  kernel  TFD  [59].  (c)  The  ADTFD  (a  = 3,b  = 8). 


based  adaptive  techniques  can  be  used  as  shown  in  Figs.  22,  23 
and  24.  These  examples  illustrate  the  possible  advantage  to  use 
image  processing  techniques  such  as  directional  filtering  for  defin- 
ing high-resolution  cross-term  free  TFDs.  The  additional  load  for 
doing  so  is  assessed  in  the  next  section. 

4.2.4.  Computational  cost  of  the  adaptive  TFD 

The  computation  of  the  adaptive  TFD  involves  the  following 
steps. 


1.  Computation  of  a QTFD:  The  computational  cost  of  this  step 
is  0(NM  log  M)  as  it  requires  N FT  operations  for  a sig- 
nal of  length  N and  the  computational  cost  of  each  FT  is 
O(MlogM),  where  M is  the  total  number  of  frequency  do- 
main samples  [69]. 

2.  Computation  of  an  adaptive  kernel:  The  computation  of  an 
adaptive  directional  kernel  requires  the  estimation  of  an  op- 
timal rotation  angle  for  each  point  in  a (t,  f ) plane.  It  can  be 
implemented  by  convolving  a QTFD  with  I<  directional  kernels 
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Fig.  24.  The  (t,  f ) representations  of  a real-life  EEG  signal,  (a)  The  CKD  (c  = 1,  D = 0.2,  E = 0.05).  (b)  The  Adaptive  kernel  TFD  [59].  (c)  The  ADTFD  (a  = 3,  b = 8). 


and  choosing  the  one  with  maximum  response  at  each  ( t , /) 
point.  The  computational  cost  of  this  step  is  O(KNM)  because 
of  the  convolution  of  a QTFD  with  I<  directional  kernels;  given 
that  the  computational  cost  of  single  convolution  operation  is 
0(NM ) as  the  required  number  of  multiplications  and  addi- 
tions is  proportional  to  the  total  number  of  points  in  a (t,  /) 
plane. 

3.  Convolution  of  a QTFD  with  an  adaptive  kernel:  The  compu- 
tation cost  of  this  step  is  0(NM ) because  it  involves  only  a 
single  convolution. 

The  total  computational  cost  of  this  algorithm  is  equal  to  the  sum 
of  the  computational  costs  of  all  the  above  mentioned  steps,  i.e., 
0(NM log M + KNM  + NM)  or  0(NMlogM  + (K  + 1)NM).  The 
computational  load  can  be  significantly  reduced  by  exploiting  the 
fact  that  ( t , /)  matrices  are  sparse  so  most  of  the  ( t , /)  points  do 
not  need  filtering  at  all. 

This  section  has  illustrated  that  the  image  processing  based 
methods  can  be  used  as  post-processing  operations  to  enhance 
QTFDs.  Such  post-processing  operations  not  only  improves  the 
readability  of  QTFDs  for  signal  analysis,  but  also  helps  in  extract- 
ing highly  discriminating  features  for  pattern  recognition  applica- 
tions. A methodology  for  defining  such  (t,  /)  features  from  high- 
resolution  TFDs  is  discussed  in  the  next  section. 

5.  Time-frequency  feature  extraction 

Signal  feature  extraction  is  a key  stage  of  any  overall  scheme 
for  pattern  recognition  and  classification  of  abnormalities  or,  more 
generally  for  any  machine  learning  design  and  algorithm  that  re- 
quires automatic  decision  making.  The  spectral  characteristics  of  a 
signal  are  traditionally  used  to  obtain  specific  details  about  the  un- 
derlying signal.  Such  frequency-domain  features  [26]  include  spec- 
tral flatness,  spectral  flux,  entropy  and  average  frequency.  In  the 
case  of  non-stationary  signals  with  spectral  content  varying  with 
time,  such  frequency-domain  features  may  fail  to  give  sufficient 
relevant  discriminating  information.  The  following  subsections  dis- 
cuss in  detail  the  (t,  /)  extension  of  some  selected  frequency- 


domain  features,  as  an  illustration  of  the  (t,  /)  feature  extraction 
methodology. 

5.1.  Time-frequency  flatness 


The  (t,  /)  flatness  feature  extends  the  notion  of  spectral  flat- 
ness; it  is  used  to  measure  the  relative  concentration  of  energy 
distribution  in  a TFD  or  conversely,  its  spread.  A high  (t,  /)  flat- 
ness indicates  that  the  energy  is  uniformly  spread  in  the  TFD  while 
a low  (t,  /)  flatness  indicates  that  the  energy  is  concentrated  in 
specific  regions  within  the  TFD.  This  feature  helps,  e.g.,  to  discrim- 
inate TFDs  of  signal  components  embedded  in  noise  from  TFDs  of 
pure  noise. 

In  order  to  derive  the  mathematical  formulation  for  the  (t,  /) 
flatness,  let  us  first  consider  the  definition  of  the  spectral  flatness. 
The  spectral  flatness  is  the  geometric  mean  of  the  FT  of  a signal 
normalized  by  its  arithmetic  mean;  expressed  as  [70]: 


SF  = M^. 

(Efcii  \m\) 


(57) 


The  (t,  /)  extension  of  the  spectral  flatness  can  then  be  defined 
by  replacing  the  spectrum  of  the  signal  by  its  TFD.  Eq.  (57)  then 
becomes  the  geometric  mean  of  a TFD  normalized  by  its  arithmetic 
mean,  which  can  be  expressed  as: 


TFflatness  — 


(n^^pljhk^ 
mEnl,Ef=i  Pln,k] 


(58) 


The  (t,  /)  flatness  assumes  a zero  value  even  if  there  is  a single 
zero  in  a TFD.  So,  in  practical  implementations  all  zeroes  of  a TFD 
are  replaced  by  very  small  values  (e.g.,  epsilon  in  Matlab). 


5.2.  Time-frequency  flux 


The  (t,  /)  flux  extends  the  stationary  spectral  flux  measure 
to  non-stationary  signals.  The  spectral  flux  measures  the  rate  of 
change  of  the  frequency  content  of  a signal  with  time,  whereas 
the  ( t , /)  flux  measures  the  rate  of  change  of  the  signal  energy  in 
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the  ( t , /)  plane  both  along  time  and  frequency  axes.  It  is  expressed  resolution  but  different  levels  of  cross-term  suppression.  It  is 

as:  defined  as  ([33],  p.  299,  [72]): 


N-lM-q 

TFflux  = Y)  53  + l> k + <?]  - /°tn’  fe]|. 

n= 1 k=l 


(59) 


TFRNE  = — - log2 


p[n,k] 


E^ilPMJ 


(62) 


where  / and  q are  predetermined  values  that  depend  on  the  rate  of 
change  of  signal  energy  in  a (t,  /)  plane.  The  values  of  the  param- 
eters used  in  this  study  are:  / = 1 and  q = 1.  This  measure  can  be 
used  to  detect  seizures  in  EEG  signals  as  previous  studies  indicate 
that  the  energy  of  the  newborn  EEG  seizure  signals  varies  slowly 
along  both  time  and  frequency  axes.  This  is  inferred  from  the  fact 
that  these  signals  can  either  be  modeled  as  piecewise  LFM  signals 
with  slowly  changing  frequency  or  as  train  of  impulses  [1].  On  the 
other  hand,  the  EEG  background  appears  as  a random  pattern.  It 
follows  that  the  (t,  /)  flux  is  expected  to  have  a lower  value  for 
EEG  seizure  signals  and  a larger  value  for  EEG  background. 

5.3.  Time-frequency  measures  of  complexity 

A measure  of  complexity  can  be  defined  by  interpreting  a 
TFD  as  a quasi-probability  distribution  function,  and  using  entropy 
measures  as  characteristics.  The  interpretation  is  that  a highly  con- 
centrated TFD  with  a small  number  of  components  has  a lower 
entropy  than  a signal  with  a large  number  of  signal  components. 
Some  of  the  entropy  measures  used  in  previous  studies  are  de- 
fined below  and  compared  in  terms  of  their  relevance  to  the  main 
objectives  of  this  study. 

1.  Time-Frequency  Shannon  Entropy  (TFSE):  It  is  defined  as  ([33], 
p.  299): 


N M 

TFSE  = - £ £ p[n,  k]  log2  (p[n,  k\).  (60) 

n=l  k= 1 

The  major  limitation  of  the  (t,  /)  Shannon  entropy  defined 
above  is  that  it  cannot  be  used  for  the  majority  of  QTFDs  as 
these  can  take  negative  values,  since  the  above  expression  is 
then  not  valid  (log  of  the  TFD).  To  remedy  this,  the  Renyi  En- 
tropy was  used  instead  ([33],  p.  298). 

2.  Time-Frequency  Renyi  Entropy:  It  is  defined  as  ([33],  p.  298): 


Some  other  complexity  measures  that  are  not  related  to  entropy, 
but  can  be  used  to  discriminate  a low  resolution  TFD  from  a high- 
resolution  TFD,  are  mentioned  below. 


• Kurtosis  (or  Ratio  of  norms):  It  is  defined  as  the  ratio  of  the 
fourth-order  moment  of  a TFD  with  the  squared  second  order 
moment  [61]: 


RN  = 


En=l  Ektl  P>,/<] 
(Enll  El  1 P2ln,k]p- 


(63) 


This  kurtosis  is  used  to  estimate  the  “peakedness”  of  TFDs  by 
calculating  the  above  ratio.  A high  value  of  the  ratio  of  norms 
indicates  that  the  TFD  is  highly  concentrated  while  a low  value 
indicates  that  the  TFD  is  poorly  concentrated.  In  many  real-life 
situations,  signal  components  can  have  significant  variations 
in  their  relative  amplitudes.  For  such  signals,  a TFD  optimized 
based  on  the  kurtosis  can  achieve  a high  energy  concentration 
for  strong  components  at  the  expense  of  poor  energy  concen- 
tration for  weak  signal  components. 

• Energy  concentration  measure:  One  approach  to  estimate  the 
spread  of  the  energy  distribution  in  the  ( t , /)  domain  for 
any  given  TFD  is  to  measure  its  support  region  (e.g.,  the 
number  of  non-zero  values).  A highly  concentrated  TFD  will 
have  a smaller  region  of  support  as  compared  to  a blurred 
TFD.  The  following  energy  concentration  measure  can  be  used 
for  estimating  this  support  region  for  a normalized  TFD  (i.e., 
EE=iEfc=i/°[n.k]  = U[57]. 


EC  = 


N M 


q 


Q>  1 


(64) 


Unlike  the  kurtosis,  TFDs  optimized  on  the  basis  of  this  mea- 
sure do  not  suffer  from  the  problem  of  poor  energy  concen- 
tration for  weak  signal  components  [57]. 


N M \ 

(61) 

n= 1 k=  1 / 

with  a > 2;  for  a = 2,  the  above  measure  becomes  equal  to 
zero  for  a normalized  TFD  (i.e.,  J2n=i  Hh=\  Paln,  k]  = 1 for 
a — 2).  The  (t,  /)  Renyi  Entropy  overcomes  the  limitations  of 
the  TFSE  by  applying  the  log  operator  to  J2n= l ZjUii  [n,  /<], 
where  J2n=i  J2k= i Pa[nJ<\  > 0 for  any  valid  TFD.  The  (t,  /) 
Renyi  entropy  is  a useful  feature  for  the  classification  of  non- 
stationary signals  because  of  its  sensitivity  to  the  number  of 
signal  components,  their  time  duration  and  bandwidth,  and 
their  amplitude  ratios  [71]. 

3.  Time-frequency  normalized  Renyi  entropy  (TFNRE):  Zero- 
mean  cross-terms  are  reduced  in  the  (t,  /)  Renyi  entropy  for 
odd  values  of  a because  of  the  summation  operation,  so  the 
( t , /)  Renyi  entropy  cannot  discriminate  a high-resolution  TFD 
with  significantly  reduced  cross-terms  from  a high-resolution 
TFD  without  any  suppression  of  cross-terms.  This  limitation  of 
the  (t,  /)  Renyi  entropy  can  be  overcome  by  first  normalizing 
the  TFD  with  the  sum  of  its  absolute  values  so  that  cross- 
terms have  an  overall  effect  of  reducing  the  TFNRE.  The  TFNRE 
is  thus  preferred  to  evaluate  the  resolution  performance  of 
a TFD  as  it  can  differentiate  between  two  TFDs  of  the  same 


In  the  reminder  of  this  study,  the  normalized  Renyi  entropy  is 
used  as  the  results  confirm  that  it  gives  better  classification  re- 
sults as  compared  to  the  Renyi  entropy.  There  are  other  measures 
of  complexity  that  have  been  defined  and  used,  e.g.,  in  [73];  but 
they  are  not  considered  here  due  to  space,  scope  and  relevance 
limitations.  Note  that  this  section  has  illustrated  a methodology 
to  define  new  (t,  /)  features  by  extending  frequency-domain  fea- 
tures to  the  (t,  /)  domain.  A similar  methodology  can  also  be  used 
to  define  new  features  by  extending  time-domain  features  to  the 
(t,  /)  domain  [26].  A list  of  such  features  is  given  in  Appendix  D. 

6.  Results,  discussion  and  further  insights 

Receiver  Operating  Characteristics  (ROC)  analysis  is  used  to 
compare  the  performance  of  the  (t,  /)  features  with  the  corre- 
sponding frequency-domain  features  for  the  detection  of  seizure 
activity  [74].  The  area  under  ROC  curve  (AUC)  is  used  as  a per- 
formance metric.  The  AUC  of  a feature  is  the  probability  that  the 
feature  would  have  a higher  value  for  a randomly  chosen  positive 
example  than  for  a randomly  chosen  negative  example  [74].  A set 
of  200  epochs  of  newborn  EEG  background  and  a set  of  200  epochs 
of  newborn  EEG  seizure  are  randomly  extracted  from  a database 
for  this  study.  The  length  of  each  segment  of  database  is  256  sam- 
ples (with  32  Hz  sampling  frequency),  corresponding  to  a duration 


24 


B.  Boashash  etal./  Digital  Signal  Processing  40(2015)  1 -30 


Table  3 

AUC  analysis  of  the  frequency-domain  features. 

Features 

AUC  values 

Spectral  Flatness 

0.6632 

Spectral  Entropy 

0.5901 

Spectral  Flux 

0.5483 

Table  4 

AUC  analysis  of  the  (t,  /)  features  obtained  by  extending  the  frequency-domain  fea- 
tures to  the  (t,  /)  domain.  These  features  are  extracted  from  the  WVD,  Spectrogram, 
ADTFD,  EMBD,  and  CKD. 


Features 

ADTFD 

CKD 

EMBD 

WVD 

Spectrogram 

( t , /)  Flatness 

0.9092 

0.7997 

0.7610 

0.6922 

0.6254 

(t,  f)  Entropy 

0.6631 

0.7207 

0.7228 

0.6157 

0.7200 

(t,  /)  Flux 

0.7792 

0.7103 

0.5490 

0.7344 

0.6280 

of  8 s.  The  relevant  database  is  described  in  [75],  and  the  relevant 
components  used  in  this  study  are  provided  as  Supplementary  ma- 
terial. 

In  order  to  extract  EEG  features,  each  segment  is  analyzed  by 
selected  TFDs;  these  are  the  Spectrogram  (SPEC),  WVD,  EMBD, 
CKD,  and  ADTFD.  The  parameters  that  lead  to  the  highest  AUC 
values  are  selected  for  each  TFD  by  performing  an  experimental 
non-exhaustive  search.  The  selected  smoothing  parameters  for  the 
EMBD  are  a = 0.05  and  p = 0.05.  The  selected  window  length  L 
for  the  SPEC  is  the  Hamming  window  of  L = 75  samples.  The  se- 
lected parameters  for  the  CKD  are  c = 1,  D = 0.04  and  E = 0.04. 
The  selected  parameters  for  the  ADTFD  are  a = 3 and  b = 8.  Note 
that  we  have  not  used  the  MBD  and  CSK-TFD  for  performance  eval- 
uation as  they  are  special  cases  of  the  EMBD  and  CKD  as  discussed 
in  Section  2.  The  performance  of  ( t , /)  features  is  compared  with 
frequency-domain  features  using  an  AUC  analysis.  Table  3 shows 
AUC  values  for  the  frequency-domain  features  while  AUC  values 
for  the  corresponding  (t,  /)  extensions  are  indicated  in  Table  4. 

The  key  observations  regarding  the  AUC  analysis  of  signal  re- 
lated features  as  observed  from  Tables  3 and  4 are  given  below. 

1.  (t,  /)  features,  which  are  extensions  of  the  frequency-domain 
features,  lead  to  higher  AUC  values  than  corresponding  fre- 
quency-domain features  for  all  TFDs  with  the  exception  of  the 
Spectrogram.  The  Spectrogram  has  lower  AUC  value  for  the 
(t,  /)  flatness  than  the  corresponding  frequency-domain  fea- 
ture i.e.,  the  spectral  flatness. 

2.  The  (t,  f)  flatness  is  the  best  performing  (t,  /)  feature  with 
AUC  value  of  0.9092  for  the  adaptive  TFD. 

3.  The  adaptive  TFD  is  the  most  suitable  TFD  for  extracting  (t,  /) 
flatness  and  (t,  /)  flux  because  of  higher  AUC  values  of  these 
features  for  the  adaptive  TFD;  the  CKD  is  the  second  best  TFD 
for  (t,  /)  flatness  while  the  WVD  is  the  second  best  TFD  for 
(t,  /)  flux. 

4.  The  EMBD  is  the  best  TFD  for  extracting  the  (t,  /)  entropy 
followed  by  the  CKD. 

5.  The  poor  performance  of  the  Spectrogram  for  extracting  the 
(t,  /)  flatness  can  be  explained  by  the  sensitivity  of  the  Spec- 
trogram to  its  window  length,  which  makes  it  difficult  to  find 
an  optimal  window  for  a large  class  of  signals. 

6.2.  Discussion  and  interpretation  of  results 

The  experimental  findings  indicate  that  the  performance  of 
frequency-domain  features  can  be  significantly  improved  by  ex- 
tending them  to  the  (t,  /)  domain.  It  is  important  to  use  high- 
resolution  TFDs  for  extracting  (t,  /)  features  as  the  best  perform- 
ing (t,  f)  features  are  extracted  from  the  ADTFD,  which  is  also  the 
best  performing  TFD  in  terms  of  resolution  and  cross-term  sup- 
pression capabilities.  These  observations  indicate  that,  for  a given 


TFD,  the  performance  of  a (t,  /)  abnormality  classification  tech- 
nique can  be  directly  related  to  the  resolution  of  a TFD;  hence  the 
need  to  define  data-dependent  high-resolution  QTFDs  suitable  for 
specific  classes  of  signals.  The  improvements  obtained  as  a result 
of  (t,  /)  translation  of  features  can  be  explained  as  follows: 

• The  Spectral  flatness  and  Spectral  entropy  measure  the  unifor- 
mity of  energy  distribution  in  the  frequency-domain.  They  can 
discriminate  a narrowband  signal  from  a wideband  signal,  but 
they  cannot  discriminate  two  wideband  signals;  e.g.  an  LFM 
signal  cannot  be  discriminated  from  white  noise.  The  (t,  /) 
flatness  and  (t,  /)  entropy  can  discriminate  random  noise  from 
signals  whose  energy  is  concentrated  in  the  ( t , /)  domain. 
The  energy  of  EEG  seizure  signals  is  usually  concentrated 
along  the  IFs  of  their  components,  whereas  the  energy  of  EEG 
background  is  usually  spread  in  the  (t,  /)  domain.  Therefore, 
seizure  signals  have  lower  (t,  /)  flatness  and  (t,  /)  entropy  as 
compared  to  background  signals. 

• The  better  performance  of  the  (t,  /)  entropy  can  be  explained 
by  the  observation  that  EEG  seizure  signals  have  slow  variation 
of  signal  energy  along  both  time  and  frequency  axis,  whereas 
the  background  has  a random  distribution  of  signal  energy. 
This  implies  that  for  seizure  signals  there  is  a little  variation  of 
signal  energy  along  the  diagonal  axis.  The  (t,  /)  flux  exploits 
this  information  by  measuring  the  total  variation  in  the  sig- 
nal energy  along  the  diagonal  axis,  which  results  in  low  values 
for  seizure  signals  and  higher  values  for  the  background.  On 
the  other  hand,  the  spectral  flux  only  measures  the  variation 
in  spectral  content  of  EEG  signals  and  cannot  measure  the  rate 
of  change  of  signal  energy  along  the  diagonal  axis. 

In  order  to  extend  and  apply  the  proposed  (t,  /)  pattern  recogni- 
tion approach  to  make  it  suitable  for  a wide  range  of  applications 
and  real-time  implementation,  several  key  points  need  further  in- 
vestigation, as  described  next. 

6.2.  Additional  time-frequency  insights  to  further  improve  performance 

6.2.2.  Designing  high-resolution  adaptive  TFDs 

The  performance  of  all  QTFDs  depends  on  the  choice  of  certain 
parameters.  For  example,  the  shape  and  size  of  an  analysis  window 
controls  the  time  and  frequency  resolution  of  the  Spectrogram. 
These  parameters  are  data  dependent  and  one  set  of  parameters 
cannot  give  optimal  results  for  all  classes  of  data.  Moreover,  it  is 
difficult  to  optimize  a single  global  kernel  for  all  the  points  in  the 
TFD  as  there  can  be  many  components  in  a signal  with  different 
directions  of  energy  concentration.  In  order  to  automatically  select 
the  optimal  parameters  that  lead  to  a high  energy  concentration 
TFD  in  which  cross-terms  and  noise  are  reduced,  several  adaptive 
methods  were  proposed  [58,61,59,76].  These  methods  (e.g.,  [61, 
59])  automatically  adapt  the  parameters  of  the  (t,  f)  kernel  at  each 
time  instant,  but  there  is  a need  in  some  situations  to  extend  these 
algorithms  so  that  the  kernel  parameters  are  optimized  locally  at 
each  (t,  f)  point  as  done  in  Section  4.2. 

6.2.2.  Designing  multidirectional  fixed-kernel  TFDs 

Separable-kernel  TFDs  give  good  performance  for  signals  whose 
energy  is  either  concentrated  along  the  time  axis  or  the  frequency 
axis  in  the  (t,  /)  domain  or  along  a direction  with  small  deviation 
from  one  of  these  two  axes.  However,  these  TFDs  fail  to  give  opti- 
mal energy  concentration  for  signals  whose  energy  is  concentrated 
along  a direction  significantly  away  from  both  time  axis  and  fre- 
quency axis  in  the  (t,  /)  domain.  In  order  to  obtain  high-resolution 
(t,  /)  representation  for  such  signals,  a rotation  parameter  can  be 
included  in  the  formulation  of  smoothing  kernels  [63].  For  exam- 
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Fig.  25.  The  (t,  f ) representations  of  a multi-component  signal  composed  of  signal  components  of  varying  amplitudes  and  chirp  rates,  (a)  The  CKD  (c  = 0.1,  E = 0.065, 
D = 0.065).  (b)  The  multi-directional  kernel  (6>i  = 0°,  02  = 30°,  c = 0.1,  D — 0.03).  (c)  The  ADTFD  (a  = 3,  b = 8). 


pie,  an  additional  parameter  0 can  be  included  to  allow  for  the 
rotation  of  the  CSK,  resulting  in  the  following  expression. 

{,  v cost#)— r sin(0)  .o  7 , . . „ „ 

e(  d > , | cos(0)v  — sin(0)r|  < D, 

0,  otherwise, 


CKD  fails  to  resolve  the  close  components  due  to  the  lack  of  di- 
rection parameter  in  its  formulation.  The  adaptive  TFD,  due  to 
the  local  adaptation  of  its  kernel,  has  resolved  the  close  compo- 
nents without  degrading  the  energy  concentration  for  the  Gaussian 
atoms.  This  MDD  should  therefore  be  a preferred  TFD  for  non- 
stationary multi-component  and  multi-directional  signals  such  as 
(65)  EEG  seizures. 


where  0 is  the  angle  of  the  kernel  with  the  Doppler  axis  in  the 
ambiguity  domain  or  the  time  axis  in  the  (t,  /)  domain.  Rotated 
(t,  /)  kernels  cannot  give  ideal  representations  for  signals  that 
have  more  than  one  possible  directions  of  energy  concentration 
in  the  (t,  /)  domain.  Adaptive  kernel  methods,  such  as  the  one  de- 
fined in  Section  4.2.2,  are  well  suited  for  such  signals,  but  these 
methods  are  computationally  expensive.  One  way  to  analyze  sig- 
nals with  multiple  directions  of  energy  concentration  in  the  (t,  /) 
domain  is  to  perform  smoothing  in  multiple  directions.  An  exam- 
ple is  provided  by  the  following  expression  which  defines  a new 
multidirectional  kernel  as  a summation  of  a predetermined  num- 
ber of  directional  kernels  (as  defined  in  Eq.  (65)): 

ec  Nd 

g(v,r)  = — Y'g(v,T,ei),  (66) 

N°  ,=1 

where  No  is  the  total  number  of  directional  kernels,  and  0;  is  the 
angle  of  the  ith  kernel. 

In  order  to  compare  the  performance  of  this  multi-directional 
distribution  (MDD)  with  the  other  methods,  let  us  use  the  multi- 
component  signal  expressed  in  Eq.  (56).  This  signal  is  selected  as 
it  has  multiple  directions  of  energy  concentration  in  the  (t,  /) 
domain,  so  conventional  methods  that  do  not  take  direction  into 
account  cannot  be  used  to  optimally  analyze  this  signal.  Fig.  25 
shows  this  signal  analyzed  using  the  MDD  and  the  two  best  per- 
forming TFDs  that  are  the  CKD  and  adaptive  TFD.  The  MDD  gives 
highest  energy  concentration  for  the  two  parallel  LFM  signals  and 
tone,  but  fails  to  concentrate  energy  for  the  Gaussian  atom.  The 
lowest  region  of  performance  of  the  MDD  kernel  concerns  the 
Gaussian  atom,  which  is  due  to  a lack  of  specific  direction.  The 


6.2.3.  Time-frequency  image  enhancement  of  TFDs 

Some  of  the  state  of  the  art  (t,  /)  image  enhancement  tech- 
niques discussed  in  Section  4 and  others  such  as  [77]  can  be  used 
as  post  processing  operations.  The  aim  is  either  to  improve  the 
resolution  of  cross-term-free  low  resolution  TFDs  (e.g.,  the  Spec- 
trogram) by  using  de-blurring  methods  or  to  suppress  cross-terms 
in  high-resolution  TFDs  (e.g.,  the  WVD).  By  including  such  image 
enhancement  techniques  within  the  overall  methodology  of  abnor- 
mality detection,  improvements  in  performance  can  be  obtained  in 
applications  such  as  classification  of  physiological  signals.  For  ex- 
ample, this  study  illustrates  that  the  two  best  (t,  /)  features  (i.e., 
(t,  /)  flatness  and  (t,  /)  flux)  for  the  detection  of  seizure  activity  in 
EEG  signals  are  obtained  from  (t,  /)  images  enhanced  by  the  DGF 
(i.e.  the  ADTFD).  The  expense  for  this  result  is  an  additional  com- 
putational cost,  whereas  among  the  non-adaptive  methods  (i.e., 
without  enhancement)  the  CKD  gives  the  best  performing  features 
and  this  method  is  computationally  less  expensive. 

6.2.4.  Extraction  of  geometric  features 

Signal  components  appear  as  pockets  of  energy  concentration  in 
the  (t,  f)  domain  or  as  segments  in  (t,  f)  images.  In  these  cases 
image  segmentation  techniques  can  then  be  used  to  locate  these 
pockets  of  energy  concentration  [78  . Geometric  features  can  then 
be  extracted  from  these  segments  to  get  information  such  as  the 
number  of  signal  components,  their  relative  positions  in  the  ( t , f) 
plane  and  the  energy  distribution  of  each  signal  component  in  the 
(t,  /)  plane.  This  information  is  critical  for  abnormality  detection 
using  change  analysis.  For  example,  in  the  case  of  a heart  sound 
signal,  the  murmur  appears  as  an  additional  pocket  of  energy  in 
the  ( t , f)  domain  (see  Fig.  4 in  [5]).  So,  the  number  of  pockets 
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Fig.  26.  The  proposed  block  diagram  of  a multi-channel  (t,  f ) abnormality  detection 
methodology  using  channel-by-channel  approach. 


Fig.  27.  The  proposed  block  diagram  of  the  multi-channel  (t,  /)  abnormality  de- 
tection methodology  using  spatial  TFDs.  Arrows  in  red  indicate  cross-TFDs,  while 
arrows  in  blue  indicate  auto-TFDs. 

of  energy  concentration,  their  location  and  their  geometric  prop- 
erties can  be  used  as  features  in  machine  learning  algorithms  for 
automatically  detecting  an  abnormality  in  a heart  sound  signal. 

6.2.5.  Extraction  of  features  from  multi-channel  signals 

In  many  cases,  multi-channel  and  multi-modal  recordings  are 
produced,  as  in  the  case  of  physiological  signals.  Let  us  consider 
two  approaches  for  extracting  (t,  /)  features  from  multi-channel 
physiological  signals,  as  outlined  below. 

1.  Analyze  each  channel  separately  using  a high-resolution  TFD 
and  then  extract  features  from  the  TFD  as  shown  in  Fig.  26. 
This  approach  ignores  mutual  information  among  various 
channels. 

2.  Analyze  a multi-channel  signal  using  a spatial  TFD  (STFD) 
and  then  extract  features  from  the  STFD  matrix  as  shown  in 
Fig.  27.  For  a multi-channel  signal  x[n\  = [x\  [n],  *2[n],  ^[n], . . . 
Xp[n]],  an  STFD  matrix  for  time  index  n and  frequency  index  /< 
is  defined  as  ([33],  pp.  325): 

M N 

p[n,k]=  E E G[n  - p,  m]x[p  + m]xH 

m=—M  p=—N 

x[p-m]e-4i7t^,  (67) 


where  G[n,m ] is  a time-lag  kernel.  Each  element  of  a STFD 
matrix  is  defined  as: 

M N 

[p[n,k]]it=  E E Cln-  p,m]Xi[p  + m]x* 

m=—M  p=—N 

x [p  - m]e-4j7r^.  (68) 

The  off-diagonal  elements  of  the  STFD  are  the  cross-sensor 
TFDs  and  the  diagonal  entries  are  the  standard  auto-sensor 
TFDs  of  the  multi-channel  non-stationary  signals.  Such  STFD- 
based  methods  can  exploit  both  the  non-stationary  character- 
istics of  the  signals  and  the  spatial  diversity  provided  by  the 
multiple  sensors.  This  is  of  course  at  the  expense  of  complex- 
ity and  a large  number  of  features. 

The  aforementioned  feature  extraction  approaches  can  be  easily 
adapted  for  multi-modal  signals  by  interpreting  signals  of  other 
modality  acquired  from  other  sensors  as  separate  channels. 

7.  Conclusion 

This  tutorial  review  paper  describes  two  key  methods  and 
techniques  needed  to  design  (t,  /)  pattern  recognition  techniques 
suitable  for  the  classification  of  non-stationary  signals  such  as 
those  encountered  in  physiological  measurements.  The  two  prin- 
cipal stages  include: 

1.  Design  of  high-resolution  TFDs  for  better  extraction  of  infor- 
mation defining  discriminatory  features. 

2.  Definition  of  new  (t,  /)  features  by  extending  traditional  time- 
domain  or  frequency-domain  features  to  (t,  f)  features  such  as 
(t,  f)  flatness  and  (t,  /)  flux. 

The  performance  of  the  ( t , f)  features  is  evaluated  using  ROC  anal- 
ysis. The  findings  show  that  (t,  f)  features  perform  significantly 
better  than  frequency-domain  features;  and  high-resolution  TFDs 
result  in  (t,  f)  features  with  higher  AUC.  The  other  findings  of  this 
study  with  respect  to  TFD  design  are: 

1.  The  Spectrogram  is  easy  to  use  as  it  depends  only  on  the  win- 
dow size  and  shape;  but  it  is  too  sensitive  to  the  window 
variations.  Improvements  on  the  Spectrogram  require  the  use 
of  separable  kernel  TFDs  with  more  control  parameters  as  in 
the  EMBD,  CKD,  multi-view  TFD,  adaptive  TFD. 

2.  LI-TFDs  are  more  suitable  for  signals  that  can  be  modeled  as 
pure  tones,  i.e.,  signals  with  energy  concentration  parallel  to 
the  time  axis  in  the  (t,  /)  domain  or  small  deviation  from  the 
time  axis;  this  includes,  e.g.,  some  EEG  seizure  signals. 

3.  DI-TFDs  are  more  suitable  for  signals  that  can  be  modeled  as 
a train  of  impulses,  i.e.,  signals  whose  energy  concentration 
is  parallel  to  the  frequency  axis  in  the  (t,  /)  domain;  this  in- 
cludes, e.g.,  spike  signals. 

4.  Separable-kernel  TFDs  have  more  degrees  of  freedom  to  adapt 
their  smoothing  kernels  as  compared  to  the  Spectrogram,  DI- 
TFDs  and  LI  TFDs.  They  can  be  optimized  for  both  signals 
composed  of  spikes  as  well  as  pure  tones. 

5.  The  CKD  is  the  best  performing  separable-kernel  TFD  as  it  has 
more  control  parameters  to  adapt  both  the  shape  and  size  of 
its  smoothing  kernel.  It  is  an  extension  of  the  EMBD  which 
is  itself  an  extension  of  the  MBD,  a modified  version  of  the 
BD.  This  best  performance  is  obtained  at  the  cost  of  greater 
complexity. 

6.  The  adaptive  TFD  is  an  extra  refinement  that  is  especially  suit- 
able for  signals  that  have  more  than  one  direction  of  energy 
distribution  in  the  (t,  /)  plane.  It  can  be  applied  to  the  best 
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high-resolution  TFD  selected  for  a large  class  of  signals;  its 
improved  performance  is  obtained  at  the  expense  of  an  ad- 
ditional computational  cost. 

In  order  to  further  improve  the  performance  of  existing  ( t , /)  ap- 
proaches with  respect  to  pattern  recognition  and  machine  learning 
requirements,  several  additional  investigations  are  needed  includ- 
ing: 

1.  Developing  algorithms  for  the  automatic  design  of  data  de- 
pendent high-resolution  TFDs  and  IF  estimation,  taking  into 
account  directional  filtering  of  TFDs  on  the  basis  of  energy 
concentration  location  criteria. 

2.  Investigating  the  further  use  of  directional  filtering  to  extract 
texture  related  features. 

3.  Applying  (t,  /)  image  segmentation  techniques  to  extract  geo- 
metric features  to  facilitate  1)  and  2)  [78]. 

4.  Investigating  the  design  of  TFDs  for  compressed  sensing  [79]. 

The  results  from  these  additional  investigations  should  help  bring 
these  techniques  to  a wider  range  of  applications,  and  enable  ad- 
vanced machine  learning  in  a ( t , /)  domain.  In  particular,  the 
above  methodologies  and  results  presented  in  this  paper  indicate 
that  in  the  context  of  detecting  abnormalities  in  newborn  brains, 
systems  that  aid  experts  decision  making  can  be  implemented  for 
use  in  neonatal  intensive  care  units  on  a trial  basis.  The  same  is 
true  for  other  applications  such  as  condition  monitoring  and  fault 
detection  in  a wide  range  of  industrial  applications. 

The  Appendices  A-E  are  provided  for  completeness  and  to  en- 
sure that  the  results  presented  can  be  reproduced  independently 
by  the  reader. 
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Appendix  A.  Resolution  performance  measure  used 
for  comparisons  of  TFDs 

The  normalized  performance  measure  P introduced  in  [38] 
takes  into  account  the  proportion  of  cross-terms  in  the  TFD  and 
provides  a measure  of  resolution  of  the  TFD,  as  illustrated  in 
Fig.  28.  The  measure  ‘P’  is  defined  as: 


Fig.  28.  The  two  dominant  peaks  are  the  signal  components;  the  middle  peak  is  the 
cross-term,  and  the  other  peaks  are  side  lobes. 

followed  by  signal  classification  using  a trained  classifier.  The  cost 
of  feature  selection  is  not  considered  as  this  step  is  part  of  of- 
fline training.  The  computational  cost  of  implementing  a TFD  is 
0(NM  log(M)),  where  N is  the  number  of  samples  of  the  signal 
being  considered  and  M is  the  number  of  sample  points  in  the 
FFT  computations.  The  computation  of  a TFD  requires  N FT  op- 
erations (one  at  each  time  instant)  and  each  FT  is  of  the  order 
0(Mlog(M)).  The  computational  complexity  of  feature  extraction 
in  general  is  0 (NM)  as  there  are  NM  points  in  the  (t,  /)  plane 
and  the  number  of  mathematical  operations  required  to  extract 
any  given  feature  is  proportional  to  the  total  number  of  points. 
The  computational  complexity  of  abnormality  detection  based  on 
the  trained  classifier  is  0(Nf );  where  Nf  is  the  total  number 
of  features.  The  total  computational  cost  of  online  computation 
is  therefore  0(NMlog(M) -h  NM -h  Nf)  or  simply  0(NMlog(M)), 
given  that,  in  the  analysis  of  algorithms,  computational  costs  of 
higher  order  is  only  considered  as  per  general  practice  [80].  The 
above  analysis  indicates  that  the  TFD  computation  is  the  major 
bottleneck  in  trying  to  achieve  an  efficient  real-time  implementa- 
tion. Further  optimization  can  be  made  by  using  specially  designed 
computational  and  memory  efficient  algorithms  that  have  been  de- 
veloped for  TFD  implementation  [69],  implementing  ( t , f)  kernels 
as  multi-taper  Spectrograms  [81],  and  using  efficient  hardware  im- 
plementations such  as  field-programmable  gate  arrays. 


Astt) 

Am(t) 


1 

2 Am(t) 


(69) 


where  Am(t)  and  As(t)  are  respectively  the  average  amplitudes  of 
the  components’  mainlobes  and  sidelobes,  Ax(t)  is  the  cross-term 
amplitude.  The  parameter  S(t)  = (/2(f)  - - (/i(0  + 

represents  the  component’s  separation  measure  and  D(t ) = /2(f)  — 
/1  (t)  is  the  distance  between  the  peaks  of  the  two  components. 


Appendix  C.  Computer  programs  in  this  study 

All  simulations  have  been  carried  out  using  Matlab  and  the  TF- 
SAP  tool-box.  The  TFSAP  tool-box  can  be  downloaded  from  the  fol- 
lowing link.  http://espace.library.uq.edu.aU/view/uq:211321,  http:// 
faculty.qu.edu.qa/boualem/tfsa.aspx,  and/or  http://www.time- 
frequency.net.  The  corresponding  code  can  be  found  in  [82,33]. 


Appendix  B.  Computational  complexity  and  real-time 
implementations 

The  implementation  of  the  abnormality  detection  methodolo- 
gies described  above  involves  two  stages:  offline  training  and  on- 
line implementation.  The  computational  complexity  of  the  online 
implementation  is  often  a major  bottle  neck  in  practical  realiza- 
tions of  pattern  recognition  schemes.  For  the  algorithms  described 
above,  this  involves  TFDs  computations  69],  features  extraction, 


Appendix  D.  Time-frequency  extension  of  time-domain  features 

Table  5 gives  the  list  of  (t,  /)  features  that  are  obtained  by 
translation  of  time-only  features  to  the  (t,  /)  domain. 

Appendix  E.  Medical  data  used  in  this  study 

Supplementary  material  related  to  this  article  can  be  found  on- 
line at  http://dx.doi.Org/10.1016/j.dsp.2014.12.015. 
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Table  5 

Time-frequency  extension  of  time-domain  features. 


Feature 

Time- 

domain  representation 

(t,  /)  extension 

Mean 

m(t)  = 

m(tJ)  = 

Variance 

a(t)  = 

h EjLi  <SM  - mrn)2 

a(t,f)  = 

m E?=i  EkL  1 (Pin,  k]  - m(tj))2 

Skewness 

Y(t)  = 

iVCT(t) 

= 

1 y^N  y^M 

NMcr (3f  f)  ^n= 1 2^k=; 

i (p[n,k\-m(tJ))3 

Kurtosis 

k(t ) - 

T.L  1 <SM  - m®)4 

k(t,f ) = 

1 y^N  y^M 

NMcr*tf)  ^n= 1 ^k=1 

( p[n,k\-m(tJ) )4 

Coefficient  of  variation 

c(t)  - 

°V) 

m(t) 

c(tJ)  = 
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